A Novel Human Zinc Finger Protein That Interacts with the Core Promoter Element of a TATA Box-less Gene*

We describe a novel human cDNA isolated by target site screening of a placental expression library, using as a probe, an essential element of a TATA box-less pro- moter corresponding to a pregnancy-specific glycopro-tein gene. The cDNA encoded a predicted protein of 290 amino acids, designated core promoter-binding protein (CPBP), which has three zinc fingers (type Cys 2 -His 2 ) at the end of its C-terminal domain, a serine/threonine-rich central region and an acidic domain lying within the N-terminal region. Additional sequence analysis and data base searches revealed that only the zinc finger domains are conserved (60–80% identity) in other transcription factors. In cotransfection assays, CPBP in- creased the transcription from a minimal promoter containing its natural DNA-binding site. Moreover, a chimeric protein between CPBP and Gal4 DNA binding domain also increased the activity of an heterologous reporter gene containing Gal4 DNA binding sites. The tissue distribution analysis of CPBP mRNA revealed that it is differentially expressed with an apparent en-richment in placental cells. The DNA binding and tran- scriptional activity of CPBP, in conjunction with its expression pattern, strongly suggests that this protein may participate in the regulation and/or maintenance of the basal expression of PSG and possibly other TATA box-less genes. The molecular mechanisms involved in the transcription of eukaryotic genes are controlled by the ordered interplay of DNA-protein and protein-protein contacts. The factors responsible for basal RNA-polymerase II transcription reaction are the core promoter elements

sequently exert their activities by modulating the basal transcriptional machinery (3).
So far, the most studied core promoter elements are the TATA box and the initiator, which are generally located at Ϫ25/Ϫ30 bp 1 or encompass the transcription start site, respectively. Also, the protein interacting with the TATA box (TATA box-binding protein) has been extensively studied (4) as well as some initiator-binding proteins (e.g. TFII-I, USF, and YY1) (5,6). Although the TATA box is the prominent element in numerous promoters, many genes have initiator sequences in addition to the TATA box. Furthermore, other TATA box-lacking promoters contain only initiator elements capable of determining the correct transcription initiation site (2), whereas in certain promoters both elements are absent, adding in this way more diversity to the early steps of promoter recognition mechanisms (7). Among the genes that possess the latter promoter contexts are those corresponding to the pregnancy-specific glycoprotein family and related members (i.e. carcinoembryonic antigen, biliary glycoprotein, and nonspecific cross-reacting antigen) (8 -10). We have recently demonstrated that the PSG5 gene is driven by a sequence acting as an initiator-like element, which is recognized by nuclear proteins derived from different cell types. Most significantly, mutations affecting this sequence completely abolished both the formation of specific DNA-protein complexes and the transcriptional activity of PSG5 promoter independently of the cell type analyzed (11). In some cases, the promoter activity of other PSG genes has been directly associated with the interaction of transcription factors related to Sp1 or AP2 (9,10,12). However, it is still a matter of debate which are the sequence-specific DNA-binding proteins that govern the complex expression patterns observed for PSG genes. These observations prompted us to delineate a strategy for the identification and molecular characterization of the DNA-binding proteins interacting with the crucial core promoter element found in the PSG5 gene. Although the purification of the transcription factors had been possible using large amounts of placental or cultured cell extracts, the isolation of their cDNAs will facilitate the study of their structural and functional properties. Therefore, we have applied an approach that has been used successfully to clone cDNAs encoding DNA binding activities (13). In this study, we isolated a partial human cDNA by probing a placental expression library with the particular PSG5 promoter context, formerly named IPS-34 (11) and referred to, henceforth, as core promoter element * This work was supported by grants from the Consejo Nacional de Investigaciones Científicas y Tecnológicas de Argentina, the Consejo de Investigaciones Científicas y Tecnológicas de la Provincia de Córdoba, and the Secretaría de Ciencia y Técnica de la Universidad Nacional de Córdoba (SECyT). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
(CPE). This cDNA encodes a novel protein that has some attractive features that may be used to predict its function as a transcription factor acting through its natural promoter context in a tissue-and gene-specific fashion.

EXPERIMENTAL PROCEDURES
Oligonucleotide Probes and Competitors-The sequences of the oligonucleotides used are as follows.
Screening of the cDNA Expression Library with a Specific DNA Target Site Probe-The human placental cDNA library constructed in the expression vector gt11 was kindly provided by Dr. J. L. Millá n (14). The library contains approximately 10 6 independent clones (95% recombinants) with an average insert size of 1.3 kb and was amplified as described previously (15). Protein replica membranes were prepared according to the procedure described by Singh (13). We applied our previous data concerning binding assays of immobilized DNA-binding proteins to improve the conditions for a reliable and specific DNAprotein interaction. Briefly, the filters were incubated for at least 5 h at 4°C in solution A (10 mM HEPES, pH 7.9, 50 mM KCl, 0.1 mM EDTA, 1 mM dithiothreitol, 10% (v/v) glycerol) supplemented with 5% nonfat dry milk. For screening, the treated filters were immediately immersed in separate recipients containing aliquots (15-20 ml) of solution A added with 0.25% nonfat dry milk, radioactive probe (0.5-2.0 10 6 cpm/ml of double-stranded CPE and competitor DNAs (25 g/ml of salmon sperm DNA and 0.35 g of poly[d(I-C)]. After incubation for at least 5 h at 4°C with gentle agitation, the filters were washed twice (10 min each wash) with solution A and exposed for autoradiography. Selected phage plaques were further purified by four subsequent screenings using as control clones those corresponding to wild type gt11 and negative recombinant phages (nr). To confirm the identity and specificity of selected clones, a set of lysis plaque assays was carried out as described by Hoeffler et al. (16).
Preparation of Protein Extracts Derived from Escherichia coli Lysogens-The generation of E. coli (Y1089) lysogens was achieved according to the procedure described in Glover (17), and those bacterial clones harboring gt11, CPBP, and nr phages were isolated and induced to synthesize their respective ␤-galactosidase fusion proteins by the addition of IPTG (10 mM). Bacterial cells derived from induced and uninduced cultures (3 ml) were centrifuged, and the pellets were resuspended in 100-l aliquots of solution B (20 mM HEPES, pH 7.9, 100 mM KCl, 0.2 mM EDTA, 20% (v/v) glycerol, 1 mM dithiothreitol, 1 mM phenylmethylsulfonyl fluoride, and 0.5 mg/ml lysozyme). After 15 min, the concentration of NaCl was adjusted to 1 M, and the solutions were mixed by inverting the tubes every 3 min for a total period of 15 min. Cell lysates were centrifuged at 4°C for 30 min at 30,000 ϫ g, and the supernatants were dialyzed using Millipore filters (type VS, 0.0025 m) against buffer B (without lysozyme) for 1 h at 4°C. The dialyzed extracts were frozen in liquid nitrogen and stored at Ϫ70°C. Protein concentrations were determined by the method of Bradford (18).
Western and Southwestern Blot Assays-Crude protein extracts derived from lysogens were separated by 7.5% SDS-PAGE and electrophoretically transferred to nitrocellulose membranes. For immunoblot analysis, the blotting buffer contained 25 mM Tris, 190 mM glycine, 20% methanol, while for the Southwestern assay the conditions were the same as those previously described (11). Blocking, washing, and incubation of the membranes with antibodies were carried out in Trisbuffered saline (20 mM Tris-HCl, pH 7.5, and 150 mM NaCl) containing 5% nonfat dry milk and 0.1% Tween 20. The monoclonal antibody against ␤-galactosidase (Sigma) was diluted 1:2500 and used as primary antibody. After incubation for 1 h at room temperature, the filter was further incubated (1 h at room temperature) with the secondary antibody, horseradish peroxidase-linked goat anti-mouse IgG (dilution 1:5000). The immune complexes were visualized by the ECL chemiluminescent detection system (Amersham Corp.) according to the instructions of the manufacturer.
Electrophoretic Mobility Shift Assay (EMSA)-EMSA experiments were carried out essentially as described previously (11). Briefly, the 32 P-labeled oligonucleotide probes were incubated with bacterial extracts in a total volume of 20 l that contained 10 mM Hepes, pH 7.9, 50 mM KCl, 0.1 mM EDTA, 10% (v/v) glycerol, 0.5 mM dithiothreitol, 0.5 mM phenylmethylsulfonyl fluoride, 1 g of poly[d(I-C)], and 1 g of denatured salmon sperm DNA. For competition experiments, the protein fractions were preincubated (10 min on ice) with the appropriate unlabeled oligonucleotide before the addition of the labeled probe. After a 15-min incubation on ice, the binding reactions were analyzed by electrophoresis on a nondenaturing 5% polyacrylamide gel.
Plasmid Constructions-The cDNA inserted in the CPBP clone was amplified by PCR, using the forward and reverse primers of gt11 phage, and then digested with EcoRI (CPBP cDNA does not have internal EcoRI sites) and ligated into the Bluescript plasmid (Stratagene) previously digested with the same restriction enzyme. Other enzymes were used to map restriction sites in the CPBP cDNA to obtain different fragments, which were subsequently cloned into pUC18 vector. These subclones contained the following inserts with the respective sizes indicated in parenthesis: SalI-BamHI (900 bp), BamHI-PstI (200 and 500 bp), and BglII-PstI (400 bp). The inserts corresponding to these subclones and that obtained in the Bluescript plasmid were sequenced in both directions using the forward and reverse primers for pUC18 or T3 and T7 primers for Bluescript. The sequencing reactions were performed by the chain termination method (19) using denatured doublestranded DNA templates.
The reporter plasmid PSG5-CAT was constructed by inserting the minimal promoter region of PSG5 (positions Ϫ254/Ϫ43 with respect to the translation start site) upstream from the chloramphenicol acetyltransferase (CAT) gene in the promoterless pBLCAT3 vector as described (20). The (17mer)2-TK-CAT reporter plasmid, which contains two Gal4 DNA binding sites in the thymidine kinase promoter, has been previously described (21). For expression of CPBP in cultured cells, the complete cDNA was subcloned into the EcoRI site of p plasmid downstream from the cytomegalovirus promoter and enhancer. A Gal4-CPBP chimeric plasmid was constructed by cloning the complete CPBP cDNA into the pSG424 plasmid (22).
Cell Culture, Transfections, and CAT Assays-COS-7 cells were grown in Dulbecco's modified Eagle's medium, supplemented with 5% fetal calf serum, streptomycin (0.1 mg/ml), and penicillin (100 units/ml). The cells were transfected either using the lipofectamine method following the manufacturer's recommendations (Life Technologies, Inc.) or by calcium phosphate coprecipitation (23) with the amounts of recombinant DNA indicated in the figure legends, adjusted to a total DNA amount of 16 g with Bluescript DNA. Cells were harvested after 48 h, and protein extracts were prepared and assayed for CAT activity as described previously (24). The transfection efficiency was controlled by ␤-galactosidase staining of COS-7 cells transfected with the pCH110 expression plasmid (Pharmacia Biotech Inc.) as described (25). To compare the CAT activities, the amounts of cell extracts were normalized to equivalent protein concentrations. Each transfection experiment was repeated at least three times with different effector plasmid concentrations. Percentage of acetylation of chloramphenicol was determined by thin layer chromatography followed by scintillation counting.
Multiple Tissue Northern Blot-The Northern transfer of RNAs (poly(A) ϩ ) derived from several human tissues was purchased from Clontech and used according to the recommendations of the manufacturer. The DNA template employed in the preparation of CPBP radiolabeled probe was obtained by PCR amplification of a 500-bp fragment from CPBP cDNA. The PCR reaction was performed with an internal primer specific for CPBP cDNA (5Ј-CGGGATCCTCTAGAAGGTTC-CCTGCTC-3Ј) and the Bluescript reverse primer, using as a template a recombinant Bluescript plasmid carrying the complete CPBP cDNA. As an internal control, the blot was probed with the GAPDH and ␤-actin cDNAs. The generation of radiolabeled fragments was performed as described previously (26). To quantify the expression of CPBP mRNA in the different tissues, the bands detected by autoradiography in the Northern blot assay were scanned with a Shimadzu dual wavelength chromatoscanner. The obtained CPBP area values were normalized to the cognate GAPDH area values.
Cell-free Transcription and Translation of CPBP cDNA-The Bluescript plasmid carrying the CPBP cDNA was digested with suitable restriction enzymes to obtain the sense and antisense orientation with respect to the T7 and T3 phage promoters, respectively. The linearized plasmids were subsequently transcribed in vitro by T7 or T3 RNA polymerases. After transcription, the mRNAs (1-2.5 g) were used separately for translation reactions in a rabbit reticulocyte lysate, according to the specifications recommended by the manufacturer (Promega). The reactions (25 l) were performed in the presence of 1.5 l of [ 14 C]leucine (353 mCi mmol Ϫ1 ). The radiolabeled polypeptides were resolved by SDS-PAGE and visualized by fluorography

RESULTS
Isolation of CPBP cDNA-Since we have previously succeeded in detecting placental DNA-binding proteins able to interact with a core promoter element (11) in solid supports (i.e. nitrocellulose membranes), an expression library containing human placental cDNAs was screened according to well established conditions. Specifically, 5 ϫ 10 5 clones were processed under such conditions, allowing us to isolate two gt11 clones containing inserts of 1.35 kb encoding core promoter elementbinding proteins, which were designated CPBP. On the basis of restriction mapping and partial sequencing, we confirmed that both clones contained identical inserts, which is not unusual in amplified libraries. The DNA binding activities exhibited by the ␤-galactosidase fusion proteins were reproducibly achieved in a lysis plaque assay using the purified CPBP clones to infect the bacterial lawn. Certainly, the chimeric proteins induced in gt11 clones harboring CPBP activities but neither the wild type gt11 nor a negative recombinant could recognize the core promoter element probe (Fig. 1A). Likewise, other plaque lifts were negative for binding when probed with several unrelated sequences contained in the nonspecific competitor oligonucleotides (not shown). A DNA-protein complex was detected in EMSA only with extracts derived from IPTG-induced CPBP lysogen, indicating that it was a product of the lacZ fusion gene (Fig. 1B). The incubation of the induced CPBP extracts with a monoclonal antibody against ␤-galactosidase, before or after the addition of the CPE probe, produced a supershift of the DNA-protein complex ( Fig. 2A, lanes 2 and 3,  respectively). This experiment clearly indicates the identity of the protein complex as a ␤-galactosidase chimeric protein.
Even more informative was the competitive EMSA analysis performed with extracts derived from CPBP lysogenic bacteria and the wild type or a mutant version of the core promoter element contained in CPE and CPEmut oligonucleotides, respectively (Fig. 2B). Formation of the DNA-protein complex was completely abolished by the addition of a 100-fold molar excess of unlabeled CPE oligonucleotide (Fig. 2B, lane 3). To test the specificity of the binding, a 100-fold molar excess of the CPEmut and three other different nonrelated oligonucleotides, named NS1, NS2, and NS3, were incubated with the induced bacterial extracts prior to the addition of the labeled CPE probe. The results of the competition experiments are shown in lanes 4 -7 (Fig. 2B). In all of the binding reactions with nonrelated competitor oligonucleotides, the formation of the CPEprotein complex remained unaffected, indicating that the chimeric protein possesses sequence specificity. Additionally, when the CPEmut DNA was used as probe (Fig. 2B, lanes 8 and  9) the presence of the DNA-protein complex was no longer detected. Taken together, these results indicate that the interaction of the fusion protein with the CPE motif is highly specific and that an essential sequence within this element, which was substituted in CPEmut (i.e. 5Ј-ACCCAT-3Ј 3 5Ј-GATATC-3Ј), is crucial for DNA recognition.
As mentioned before, we established that CPBP1 and CPBP2 contained identical inserts; thus, we continued our experiments with only one cDNA (CPBP1). This clone was PCR-amplified and subcloned in the Bluescript plasmid to perform further manipulations. Several subclones were obtained, and the complete sequence (1350 bp) was accomplished in both strands of each insert. The nucleotide sequences of CPBP cDNA and its deduced protein are depicted in Fig. 3. On the basis of sequence analyses and data base searches, we con- cluded that the CPBP cDNA and its conceptual polypeptide have not been reported previously; however, it has only partial homologies with some transcription factors (see below).
Structural Features of CPBP-The CPBP cDNA has an open reading frame of 290 amino acids, which in turn constitutes a polypeptide with a calculated molecular mass of 33 kDa and an isoelectric point of 9.404. In contrast, the presence of the AUG codon in the farthest 5Ј position matches almost completely (8 of 9 nucleotides) with the consensus sequence (cc(g/a)ccAUGg) proposed by Kozak (27,28) to be an essential element in the scanning model of translation. This initiation codon may, therefore, allow for the synthesis of a protein with a primary structure of 283 amino acids. The molecular mass of the fusion ␤-galactosidase-CPBP protein was determined by SDS-PAGE to be approximately 150 kDa. This chimeric protein was identified with two different methodologies, taking advantage of the ␤-galactosidase portion and the DNA binding activity of CPBP (Fig. 4, A and B, respectively). In Western blots, the induced fusion protein was revealed with a monoclonal antibody against ␤-galactosidase and migrated with a relative mass of 150 kDa (Fig. 4A). Accordingly, the Southwestern blotting analysis indicated that the same band accounts for the DNA-binding activity when probed with the CPE oligonucleotide (Fig. 4B). These results confirmed the identity of the fusion protein and also allowed us to deduce the molecular mass of the CPBP fraction by subtracting the ␤-galactosidase portion (116 kDa) from the 150 kDa corresponding to the chimeric protein; thus, the CPBP fraction is responsible for approximately 34 kDa of the total molecular mass. To confirm that the predicted CPBP protein can be synthesized in a eukaryotic system, we performed in vitro translation of the CPBP mRNA (1.35 kb). To this end, the mRNA encoding the CPBP protein or the antisense was transcribed in vitro, and the cognate protein was translated in a rabbit reticulocyte lysate system. The synthesized polypeptide had an apparent molecular mass of 32 kDa in SDS-PAGE (Fig. 4C), which is in good agreement with the calculated mass deduced from its primary structure and from that determined by Western and Southwestern blot assays. The shorter polypeptides detected in the presence of the sense mRNA may be due to incomplete synthesis of the CPBP protein (Fig. 4C). In conclusion, the CPBP mRNA has the potential to be translated into a polypeptide of 32 kDa in a eukaryotic system.
As a first step toward the identification of characteristic domains for DNA-binding proteins, several sequence alignments were performed with other known transcription factors. These analyses allowed us to determine that CPBP has three contiguous zinc fingers (type Cys 2 -His 2 ) at the end of its Cterminal portion (Fig. 5, A and B). The cysteine and histidine residues as well as other conserved amino acids are present in the zinc finger structures of CPBP (Fig. 5A). Additionally, the variations of key residues in different arrays of zinc fingers determine different affinities and specificities that are displayed by such DNA binding domains (29). The target sequences recognized by several related zinc-finger proteins (e.g. EKLF and BTEB) share strong similarity with each other and have been proposed to be a guanine-rich binding site (30, 31, 33). 2 In this regard, the sequence recognized by CPBP fulfills these criteria and can also be included in this subset of zinc finger proteins.
These findings can be rationalized in the light of other reports. For instance, Klevitt (34) proposed that certain basic residues in the finger are responsible for the contacts with guanine nucleotides present within triplets, which in turn constitute the binding site. These amino acids are termed X, Y, and Z and are depicted in Fig. 5C. Therefore, assuming that Klevitt's predictions apply for CPBP contacts with DNA, the target site can be deduced to be 3Ј-GGN GNG GGN-5Ј. In fact, this sequence is in perfect correspondence with the natural context detected by previous in vitro approaches (11) (Fig. 5C). These data, along with the results accomplished by competitive EMSA using a mutant CPE, are compelling evidence that the zinc fingers of CPBP are responsible for the specific DNA binding with the guanine-rich CPE.
Other interesting features can also be distinguished from the amino acid composition analysis of CPBP. For instance, the central region of the CPBP polypeptide, located between positions 114 and 208, is serine/threonine-rich and could be a potential target for some post-translational modifications like phosphorylation (Fig. 5B). In addition, the high content of acidic residues found from amino acid 19 to 112 conferred a predominant negative interphase (net charge ϭ Ϫ11) to the  1 and 3) or IPTG-induced (lanes 2 and 4) crude lysates from wild type gt11 (wt) and CPBP lysogens, as indicated, were analyzed by Western blot (A) and Southwestern blot (B). A, the bacterial extracts were resolved by SDS-PAGE and electrotransferred to nitrocellulose. The immobilized proteins were incubated with a monoclonal antibody against ␤-galactosidase, and the bands were visualized by a chemiluminescent system. B, a similar assay was carried out as in A, except that the proteins were treated as described under "Experimental Procedures" and visualized by incubation of the nitrocellulose membrane with the CPE-labeled probe and autoradiography.
The arrows indicate the migration and the estimated molecular mass (150 kDa) of the ␤-galactosidase-CPBP fusion protein. C, the pBluescript plasmid containing the CPBP-cDNA was linearized, and the sense or the antisense mRNAs were in vitro transcribed by the T7 or T3 RNA polymerase-dependent system, respectively. The cognate mRNAs were translated in a cell-free reticulocyte lysate in the presence of 14 C-labeled leucine and as the radioactive precursor. The arrowhead indicates the migration of the labeled, in vitro synthesized CPBP polypeptide. Numbers on the right refer to the positions of the molecular mass markers in kDa.

FIG. 5. Structural analysis of the CPBP polypeptide.
A, CPBP contains a zinc finger domain of the Cys 2 -His 2 type. Alignment of the putative zinc finger sequences from CPBP and BTEB or EKLF proteins.
x, xenophus, h, human; r, rat; m, mouse. Identical regions (one-letter code) are represented by white letters boxed in black. The conserved cysteine and histidine residues are shown in black and gray boxes, respectively. Letters outside boxes correspond to the nonconsensus regions. Hyphens correspond to gaps introduced to optimize the alignment. B, the polypeptide corresponding to the CPBP open reading frame is schematically represented. The gray boxes depict the amino acids involved in the Cys 2 -His 2 -type zinc finger structure. The black box and the white box containing minus symbols or S (serine) and T (threonine) represent the predicted acidic and serine/threonine-rich domains, respectively. The bar with numbers indicating amino acids positions for each domain is shown at the top of the scheme. C, sequence comparison between the natural and predicted DNA-binding site for CPBP zinc finger domain. The peptide sequence (one-letter code) of the three putative CPBP zinc finger domains numbered from 1 to 3 are shown in white letters boxed in black. The X, Y, and Z represent the key coordinates used for the prediction of DNA-binding sites of known zinc finger domains according to Klevitt (34) (noted above the amino acid sequence). The sequence of the deduced and natural DNA-binding sites for the CPBP zinc finger domain are shown at the bottom (N indicates any of the four deoxynucleotides). N-terminal domain (Fig. 5B). The possible role of such structures will be interpreted under "Discussion." Finally, the presence of potential phosphorylation sites, notably in serine and/or threonine residues, which in turn constitute the 22.6% of the CPBP polypeptide, most probably indicates a post-translational regulation of CPBP protein.
Functional Analysis of CPBP-To determine the impact of CPBP on transcription, its cDNA was cotransfected in COS-7 cells with a reporter plasmid carrying the CAT gene driven by the PSG5 promoter sequence (positions Ϫ254/Ϫ43) (20). This reporter vector contains the CPE sequence located between positions Ϫ150 and Ϫ124, overlapping one cluster (Ϫ130/Ϫ134) of the two transcription start sites described for the PSG5 gene. Subsequently, CAT activity was measured, and the results obtained are shown in Fig. 6A. Increasing amounts of effector plasmid in the presence of adjusted quantities of reporter DNA resulted in a clear and dose-dependent stimulation of CAT activity (Fig. 6B, lanes 1-4). To further extend our results, we constructed a chimeric version of CPBP fused with the DNA binding domain of Gal4 and tested the activity of this chimera on a promotor containing two Gal4 binding sites (Fig. 6B, lanes  5-8). Taken together, these approaches enable us to unequivocally indicate that CPBP is capable of activating transcription approximately 4-fold either on homologous or heterologous promoters.
Expression of CPBP Transcript in Human Tissues-Hybridization of a radiolabeled 5Ј fragment (outside the zinc fingers) of the CPBP cDNA to a Northern transfer of poly(A) ϩ RNAs from different human tissues allowed us to determine that the CPBP transcript is differentially expressed as unique species of 4.5 kb (Fig. 7). Considering the normalized mRNA levels, the CPBP transcript appears to be enriched in placental cells (Fig.  7, lane 3). In contrast, the transcript level in other tissues (i.e. pancreas, lung, liver, heart, and skeletal muscle) was present in decreasing amounts or was undetectable (kidney and brain). In sum, the in vivo mRNA levels observed for CPBP in different human organs indicate that transcriptional regulation plays a crucial role in CPBP expression. It is important to remark that essentially the same results were obtained when the values for CPBP mRNA expression were normalized using the ␤-actin mRNA instead of GAPDH (not shown). DISCUSSION We have previously reported the characterization of an essential element of the PSG5 gene promoter that contributes to its basal activity in different cell types. Furthermore, we have demonstrated that this element is recognized by distinct proteins (11). In an attempt to identify these proteins, we have performed a target site screening of a placental expression library, and a cDNA encoding a polypeptide with specific CPE binding activity, designated CPBP, was isolated. DNA sequencing and subsequent data base searches indicated that the CPBP cDNA encodes a previously undiscovered protein that bears a three-zinc finger (Cys 2 -His 2 ) motif at its C-terminal domain.
We have also determined that a fusion protein constituted by CPBP and ␤-galactosidase exerted its DNA binding activity by recognizing a particular GC-rich sequence of the CPE element. It is worth mentioning that although the competitive EMSAs were performed with a fusion protein, essentially the same results were achieved either with a bacterially synthesized CPBP or a CPBP polypeptide translated in a reticulocyte lysate. 3 So far, we have sequenced 1.35 kb of the CPBP cDNA, and when it was used as a probe in a Northern blot analysis a single 4.5-kb band was identified (Fig. 7). At present, we are using distinct strategies to isolate the full-length cDNA to decipher the complete primary structure of CPBP. Nevertheless, a consensus sequence for translational initiation encompasses the first AUG codon at the 5Ј end of CPBP cDNA, although the reading frame remains open to its upstream side (Fig. 3). It seems likely that this AUG codon serves as the regular translation start site, since it is recognized at high efficiency in a reticulocyte lysate to produce a polypeptide that migrates in SDS-PAGE as a band of 32 kDa. These data are in good agreement with the calculated molecular mass of the CPBP protein as deduced from its primary structure. However, the coding potential for CPBP mRNA is not limited to the 32-kDa protein as mentioned above. In a previous report, we determined that at least two placental proteins with molecular masses of 78 and 53 kDa were involved in the formation of specific DNA-protein complexes (11). In this work, we found that CPBP has the same sequence requirements for DNA binding as the complexes detected with protein extracts from culture cells and has the ability to stimulate the activity of both homologous and heterologous promoters about 4-fold. Considering these data, it would be essential to identify the in vivo synthesized CPBP to correlate its molecular mass to those previously described (Ref. 11 and this work). To this end, we are currently working to produce antibodies against specific CPBP peptides.
Regarding the in vivo expression patterns of CPBP mRNA, we found that its level of transcription varies in different hu-man tissues, as determined by Northern blot analysis. The highest mRNA level was detected in placenta, which is the organ where PSG genes are predominantly expressed. Thus, one might speculate that the CPBP factor plays a pivotal role in the regulation of PSG expression in placental cells. In contrast, the CPBP mRNA level in other organs was moderate or undetectable. These observations are consistent with the idea that CPBP expression is differentially regulated at the transcriptional level and that CPBP actions may lead to a tissue-specific regulation of its target gene(s).
Having identified a novel DNA-binding protein that interacts with fundamental promoter elements, we aimed at searching for the potential transcriptional activity of CPBP. In cotransfection experiments, the functional properties of CPBP were determined, indicating that this novel protein increases transcription levels. We also asked whether the activating functions of CPBP are specific for the PSG5 promoter or can be elicited in different promoter contexts. As a first step in this direction, a chimera consisting of CPBP and the DNA binding domain of Gal4 was constructed and cotransfected with a reporter containing Gal4 binding sites. On the basis of the data shown here, CPBP is able to stimulate transcription either in its natural context or in a heterologous promoter context.
The PSG5 promoter contains neither TATA box nor typical pyrimidine-rich initiator motifs; however, one core promoter element (CPE box) overlaps the farthest transcription start site and is essential for DNA-protein interactions and promoter activity in different cell types (11). Interestingly, the CPE box spans a region positioned between nucleotides Ϫ24 and Ϫ50 bp upstream from the major transcription start site (ϩ1), thus resembling the TATA box location (Ϫ30/Ϫ35 bp) found in most eukaryotic genes. This observation implies that CPBP most probably acts as a tethering factor that serves to recruit and/or maintain the transcription machinery within the boundaries of the promoter and allows for the accurate transcriptional initiation from this particular TATA-less promoter. Moreover, the CPE box is crucial to sustain roughly similar levels of PSG5 transcriptional activity in different cell types, independently of whether the cells do or do not express PSG genes, suggesting that this sequence and its cognate CPBP are mainly involved in basal promoter activity. These observations are not in disagreement with the CPBP dependent transcription enhancement observed, since CPBP may behave as a rate-limiting factor for basal activity, whereas overexpression of this factor may overcome the rate-limiting step, and consequently an activated transcription level is sustained.
A comparison at the amino acid level of the CPBP sequence with its most closely related transcription factors (i.e. EKLF and BTEB) revealed a high degree of conservation of the zinc finger regions (Fig. 5A), but no other significant similarities have been found in the rest of the molecule. The regulatory proteins that share homology between the zinc finger domains, most specifically in the residues contacting the DNA (positions X, Y, and Z; Fig. 5A), may act as potential regulators of transcription by interacting with their cognate sequence as well as with other similar motifs. For instance, transcription factors such as EKLF and BTEB, whose transcriptional activities and DNA-binding specificities have been well established (29 -31), 2 may contact the CPE motif present in the PSG5 promoter via their zinc fingers. Alternatively, other promoters that are targets for the aforementioned regulatory proteins may also be regulated by CPBP, which in turn contributes to a higher diversity in promoter recognition.
Interestingly, the central and N-terminal portions of the CPBP protein are well distinguished between each other and bear some attractive features. In this regard the central region CPBP mRNA levels were normalized to the control GAPDH mRNA. The Northern blot assay was carried out with poly(A) ϩ RNA from the tissues indicated and hybridized with the labeled CPBP-cDNA probe. As control, the same blot was hybridized with a labeled probe for GAPDH mRNA. The migration of the indicated CPBP (4.5 kb) and GAPDH mRNAs are indicated with arrows.
of CPBP (amino acids 114 -208) is a serine/threonine-rich domain (28.7%) that might be involved in activation or posttranslational regulatory pathways. In contrast, the N-terminal domain (residues 19 -112) contains acidic amino acids conforming to a predominant negative interface with a net charge of Ϫ11. These acidic residues are interspersed among numerous hydrophobic amino acids (40% of the domain) of which approximately 50% correspond to leucine and isoleucine. Such a type of structure has been proposed to play an important role in the process of transcriptional activation, where the interactions of the activation domain with its target protein might be mediated by hydrophobic forces (3). Additionally, the basic domains present in some general transcription factors (e.g. TFIIB, TATA box-binding protein) have been postulated to be the targets of acidic activators (32,35). In conjunction, all of these features delineate a modular structure for CPBP, which might function as a tethering factor that binds TATA box-less promoters or an activator itself by mediating interactions, e.g. via its acidic domain, with one (or several) general transcription factors. Further investigations of the structure-function relationship of CPBP as well as full elucidation of its role in transcription will shed more light on the molecular mechanisms that are regulated by this novel protein.