A gene for a novel zinc-finger protein expressed in differentiated epithelial cells and transiently in certain mesenchymal cells.

We have identified a novel zinc-finger protein whose mRNA is expressed at high levels in the epidermal layer of the skin and in epithelial cells in the tongue, palate, esophagus, stomach, and colon of newborn mice. Expression in epithelial cells is first detected at the time of their differentiation during embryonic development. In addition, during early embryonic development there is expression in mesenchymal cells of the skeletal primordia and the metanephric kidney which is later down-regulated. The expression pattern suggests that the protein could be involved in terminal differentiation of several epithelial cell types and could also be involved in early differentiation of the skeleton and kidney. The carboxyl terminus of the protein contains three zinc fingers with a high degree of homology to erythroid krüppel-like factor and binds to DNA fragments containing CACCC motifs. The amino-terminal portion of the protein is proline and serine-rich and can function as a transcriptional activator. The chromosomal location of the gene was mapped using mouse interspecific backcrosses and was shown to localize to mouse chromosome 4 and to cosegregate with the thioredoxin gene.

During embryogenesis, a single-cell zygote gives rise to a complex organism composed of many cell types which differentiate from their precursors in an intricate process involving the coordinate action of various cytokines, hormones, and growth factors which activate the expression of specific transcription factors in the cell. Many cell-specific transcription factors involved in the differentiation of tissues have been identified: these fall into several broad classes that include helix-loophelix proteins, homeodomain proteins, and zinc-finger proteins. Zinc-finger transcription factors generally contain a cluster of zinc-finger motifs which together bind to specific DNA sequences; they can be divided into several classes based on the sequence and position of the cysteine and histidine residues and other conserved amino acids. The TFIIIA subclass of zincfinger proteins is characterized by an amino acid motif (Cys-X 2-4 -Cys-X 12 -His-X 3-4 -His) that coordinates zinc ions and is involved in DNA binding. Some zinc-finger proteins of the TFIIIA subclass are widely expressed, such as Sp1 and the basic transcription element binding factor, BTEB 1 (1,2). However, other members of the class have a very restricted pattern of expression such as the Wilm's tumor gene product (WT-1) which is expressed in the urogenital system during early development or the erythroid krü ppel-like factor (EKLF) which is highly expressed in erythroid cells (3,4).
Several experiments including gene targeting techniques have demonstrated that zinc-finger proteins are important for the proper differentiation of tissues in which they are expressed. EKLF is a zinc-finger protein highly expressed in erythroid cells which has been shown to bind to a CACCC box sequence found in the ␤-globin promoter (4). Loss of functional EKLF in mice results in lethal anemia since since ␤-globin is not synthesized (5,6). Krox-20 (or EGR-2) is a zinc-finger protein specifically expressed in rhombomeres 3 and 5 of the hindbrain during early embryonic development (7). Loss of functional Krox-20 in mice results in malformation of these portions of the brain (8,9). The Krox-20 gene is also expressed in Schwann cells and hypertrophic chondrocytes and osteoblasts of the skeleton. Indeed, there are also defects in myelination and endochondral ossification in Krox-20 knockout mice (10,11). WT-1 is a zinc-finger protein expressed during embryogenesis in the kidney and genital tissues (3). In mice loss of functional WT-1 protein results in failure of the kidney and gonads to form (12). Some cases of Wilm's tumor in humans arise via inactivating mutations of the WT-1 gene (13,14). Patients with Wilm's tumor also frequently show signs of genital abnormalities (15).
One major interest of our research is the differentiation of fibroblasts and osteoblasts from their mesenchymal precursors. As a marker for these lineages we have studied the expression of the mouse ␣2(I) collagen gene. Previous evidence from our laboratory indicated that high levels of expression of the mouse ␣2(I) collagen promoter in transfected cells was dependent on three G/C-rich regions in the proximal promoter which appeared to bind several members of the zinc-finger family of transcription factors (16). In an effort to clone cDNAs for zincfinger proteins which were expressed at high levels in primary mouse embryonic fibroblasts and which might play a role in activating the collagen promoter, we cloned a novel protein which we call epithelial zinc finger (EZF). The gene for this protein is expressed in certain mesenchymal cells that give rise to the skeleton and kidney. However, expression in these cell types is a transient phenomenon and is not found in newborn mice in which the collagen genes are actively expressed. Instead, expression of the EZF message switches almost exclusively to epithelial cell types (hence the name epithelial zinc finger) during the later stages of embryogenesis at the time of their differentiation. Expression in epithelial cells is also high in newborn mice. Our results suggest that the EZF protein is a transcriptional activator that may play a role in the differentiation of epithelial cells in many organs including the skin, tongue, esophagus, stomach, and colon. EZF may also function in the development of the skeleton and kidney during early steps in the formation of these organs.

EXPERIMENTAL PROCEDURES
Cloning of EZF-First-strand cDNA was prepared from poly(A) ϩ RNA of primary mouse fibroblasts derived from embryos at E13.5. This cDNA was then used as a template in a polymerase chain reaction (17) with an oligo(dT) primer and two degenerate primers (CACATCAGGA-CCCA(C/T)ACIGG(A/G)GA and CACATCCGIACCCA(T/C)ACIGG(T/ C)GA) homologous to an amino acid sequence (HIRTHTGE) that is conserved among several members of the zinc-finger family. The PCR products were hybridized with a second degenerate oligonucleotide (ACCGGCGA(A/G)AA(A/G)CCITT(T/C)G(A/C)ITG) homologous to an overlapping region of the zinc-finger domain (TGEKPFAC). DNA from 50 random positive clones was sequenced and compared with the Gen-Bank data base. Two of the clones contained partial cDNAs for a novel zinc-finger protein which we call EZF. A ZAP library was then constructed from poly(A) ϩ RNA of E15.5-day whole mouse embryos and screened with the insert from one of the EZF clones isolated above. Three positive clones of 2.5 to 2.8 kb encoding the complete open reading frame were isolated.
Northern Analysis and in Situ Hybridization-Total RNA was isolated from various mouse tissues and cells as described (18). Twenty micrograms of total RNA was electrophoresed through formaldehyde gels, transferred to Zeta-Probe membranes (Bio-Rad), and hybridized with a probe containing a portion of the zinc-finger region and 3Јuntranslated sequences.
To generate a probe for in situ analysis, we cloned a 160-bp fragment of the EZF cDNA encoding a nonconserved portion of the proline-rich domain into pBluescript KS between the NotI and SacI sites to yield the plasmid pBS160. DNA was then either linearized with SacI and transcribed with T3 polymerase to yield a sense RNA probe, or linearized with XbaI and transcribed with T7 polymerase to yield an antisense RNA probe. Transcription reactions included [ 35 S]UTP as label. In situ hybridizations were performed as described previously (17) with minor modifications. Slides were exposed at 4°C for 12-14 days.
Glutathione S-Transferase Fusion Proteins and Gel Shift Analysis-A segment of the EZF cDNA coding for amino acids 321-474 was cloned into the EcoRI and XhoI sites of the vector pGEX-4T3 (Pharmacia, Piscataway, NJ) to generate the plasmid pND2, which produces an in-frame fusion of glutathione S-transferase (GST) with the zinc fingers of EZF. For isolation of GST proteins, bacteria were grown to early log phase and induced for 30 -90 min with 1 mM isopropylthiogalactose. The resuspended bacterial pellet was then sonicated in E buffer (12 mM Hepes, pH 8.5, 100 mM NaCl, 5% glycerol, 0.25 mM ZnCl 2 , 0.1 mM EDTA, 1% Triton X-100, 0.01% Nonidet P-40, 1 mM dithiothreitol, and the protease inhibitors phenylmethylsulfonyl fluoride at 1 mM and pepstatin, leupeptin, and aprotinin at 10 g/ml). The supernatant was then incubated with glutathione-agarose resin (Sigma) for 30 min and washed with E buffer. Finally, the protein was eluted with 5 mM glutathione in E buffer.
For gel shift analysis, double-stranded oligonucleotides were labeled with Klenow enzyme and [ 32 P]dCTP or with polynucleotide kinase and [␥-32 P]ATP. Probe oligonucleotide (40,000 -60,000 cpm) was incubated with 50 -100 ng of recombinant GST-ND2 fusion protein in E-2 buffer (same as E buffer described above but without Triton X-100 and with 2 mg/ml bovine serum albumin) at room temperature for 20 min. The reaction products were run on 4% polyacrylamide gels containing 0.5 ϫ Tris borate-EDTA (TBE) buffer.
Transcriptional Activation-For activation studies, various segments of the EZF cDNA were cloned between the Asp718 and BamHI sites of the Gal4 expression vector, pSG424 (19) in-frame with the Gal4 DNA-binding domain. HeLa cells were grown in 5% horse serum, 5% fetal calf serum in Dulbecco's modified Eagle's medium in 8% CO 2 . One microgram of Gal4 expression plasmid was transfected into cells along with 10 g of a reporter plasmid containing five Gal4 DNA-binding sites (20) and 5 g of SV␤-Gal as an internal control. The cells were electro-porated with a Gene Pulser electroporator (Bio-Rad) and harvested 48 h after transfection. Chloramphenicol acetyltransferase activities were measured using a liquid scintillation assay as described previously (21). ␤-Galactosidase activities were measured using a resorufin-␤-D-galactopyranoside substrate at 30°C.
Chromosome Localization-C3H/HeJ-gld and Mus spretus mice and [(C3H/HeJ-gld ϫ Mus spretus)F1 ϫ C3H/HeJ-gld] interspecific backcross mice were bred and maintained as described previously (22). M. spretus was chosen as the second parent in this cross because of the relative ease of detecting of informative restriction fragment length variants in comparison with crosses made using conventional laboratory strains.
DNA was isolated from mouse organs, blotted to nylon membranes, and hybridized with a fragment of the EZF cDNA under stringent conditions as described previously (23). Gene linkage was determined by segregation analysis (24). Gene order was determined by analyzing all haplotypes and minimizing crossover frequency between all genes that were determined to be within a linkage group. This method resulted in determination of the most likely gene order (25). The mapping of reference loci in this interspecific cross (Tsha, Gabrr, Txn, and Jun) has been previously described (26 -28).

RESULTS
Cloning of EZF-We sought to identify novel zinc-finger proteins of the TFIIIA subclass which were expressed in embryonic fibroblasts. We used a polymerase chain reaction strategy with degenerate oligonucleotides (17) to isolate partial cDNA clones of zinc finger proteins using cDNA from primary fibroblasts of 13.5-day mouse embryos as a template. Two of the partial cDNA clones isolated in this fashion encoded a novel zinc-finger protein which we denoted EZF based on its expression pattern (see below). The cDNA clones isolated by polymerase chain reaction were subsequently used to screen for clones containing the entire open reading frame of the protein.
Sequencing of the various cDNA clones showed the presence of three ATG codons near the 5Ј end (Fig. 1A). The first of these ATG elements was followed by an open reading frame of 261 nucleotides capable of encoding an 87-amino acid peptide. However, the sequences surrounding this ATG codon are not in a favorable context for translation initiation as defined by Kozak (29). A second ATG codon in a different reading frame was located 599 base pairs from the 5Ј end of the sequence shown in Fig. 1A. This ATG codon begins an open reading frame of 1449 nucleotides which would correspond to a 483-amino acid protein (Fig. 1A). However, this second ATG codon is also not in a very favorable context for translation initiation. It is, therefore, likely that the major translation product initiates at a third ATG codon located 27 nucleotides further downstream which better fits the consensus translation initiation site. Translation from this ATG codon would result in synthesis of a 474-amino acid protein (Fig. 1A.) with a predicted molecular mass of 52 kDa. Several stop codons in all three frames are located upstream of the major open reading frame indicating that the protein coding sequence does not extend further 5Ј.
The predicted protein sequence (from either ATG-2 or ATG-3) contains three zinc fingers of the TFIIIA type at the extreme carboxyl terminus. As shown in Fig. 1B these zinc fingers have a high degree of homology with those present in lung krü ppel-like factor, EKLF, and BTEB2. Since lung krü ppel-like factor, EKLF, and BTEB2 are closely related in the DNA-binding domain, contain proline-rich amino-terminal regions, and are expressed in a tissue-restricted fashion, they have been proposed to fall into a separate subclass of zincfinger proteins (30) that likely includes EZF as well. A lower but significant homology was also found with the zinc-finger regions of other proteins including BTEB, Sp1, WT-1, and the early growth response proteins EGR-1 and EGR-2. The first 384 amino acids of the protein contained a domain rich in proline (15%) and serine (13%) characteristic of certain transcriptional activation domains. Upstream of the zinc finger was a stretch of basic amino acids similar to a region in EGR-1 that has been shown to be important for nuclear localization (31).
Expression Pattern of the Ezf Gene-Northern analysis with total RNA from tissues of newborn mice showed a major mRNA of 3.5 kb. In tissues with high levels of expression such as skin and lung we could also detect a minor mRNA of 1.9 kb (Fig. 2A). The relationship of the 1.9-and 3.5-kb RNAs is not presently clear but they may be alternatively spliced forms or be derived from alternate usage of polyadenylation signals. Newborn mouse skin had the highest levels of expression in our Northern analysis; lower levels were detected in lung and much lower levels in heart, kidney, brain, and liver. In addition, we also detected relatively abundant expression in primary fibroblasts derived from mouse embryos at E13.5 which was the source of the original EZF cDNA clones (data not shown). To analyze the developmental expression pattern of the Ezf gene we performed Northern analysis of RNA from mice at a series of embryonic stages. There was no expression in 9.5-day embryos, a low level of expression in 11.5-and 13.5-day embryos, and a high level of expression in 15.5-day embryos (Fig. 2B).
To better localize the cells expressing the Ezf gene, we performed in situ hybridization with mouse embryos at various stages and also with tissues from newborn mice. As expected from the Northern analysis, no expression was detected in embryos at 9.5 days of development (data not shown). However, at 11.5 days of development we could detect expression in populations of undifferentiated mesenchymal cells in the nasal prominence and first branchial arch (Fig. 3A). Later during development mesenchymal cells that form the metanephric kidney (Fig. 3B) and the cartilaginous primordia of the skeleton (Fig. 3C, for example) showed a positive signal. In the skeletal primordia the EZF signal was confined to the peripheral layer of cells and was absent from more centrally located chondrocytic cells. In newborn mice we could detect little or no expression of the Ezf gene in mesenchymal cell types of various organs including bones and cartilages of the skeleton (data not shown). These observations suggest that expression of the Ezf gene in mesenchymal cell populations was transitory phenomenon of development.
At E12.5 we detected expression of the EZF mRNA in epithelial cells in the dorsal surface of the tongue (data not shown). Expression in other epithelial cell types was very low until E15.5 when widespread expression in epithelial cells of the epidermis, vibrissae, oral mucosa, esophagus, and colon became apparent (see Fig. 4, A and B, for examples). In newborn mice, there was also widespread expression in various epithelial cells (Fig. 4C and Fig. 5). Our data indicate, however, that not all epithelial cell types express high levels of EZF since sections of the small intestine and bladder showed little or no signal (data not shown). High-power views of the epidermis indicated that the signal was much stronger in the differentiating suprabasal levels than it was in the mitotically active basal layer or in the underlying dermis (Fig. 5). Thus, expression of the EZF mRNA in epithelial cell types seemed to largely correlate with the onset of their differentiation.
Biochemical Characterization of the Protein-To test the ability of the EZF protein to bind to DNA, a recombinant fusion protein (GST-ND2) was generated containing GST fused inframe to the zinc-finger region of EZF (amino acids 321-474). Using the GST-ND2 fusion protein, gel shift assays were performed with various oligonucleotides previously shown to bind to related zinc-finger proteins ( Table I). The results show that the EZF fusion protein bound most efficiently to an oligonucleotide derived from the ␤-globin promoter containing a CACCC box sequence (Fig. 6A, lane 6). The EZF fusion protein could also bind well to an Sp1 site derived from the SV-40 promoter (lane 5) and to a BTE element from the rat cytochrome P4501A1 promoter (lane 4) which is a target for the BTEB proteins. Another Sp1 site (lane 3) derived from the human immunodeficiency virus long terminal repeat bound the recombinant EZF protein weakly but there was no binding to a high-affinity WT-1-binding site (lane 2) or to a consensus EGRbinding site (lane 1). These results suggest that the EZF protein binds to a subset of G/C-rich sequences.
To determine whether the CACCC motif of the EKLF-binding site was critical for EZF binding we tested three different point mutations in this site (Table I) which were previously shown to abolish binding of EKLF (36). The EZF fusion protein bound very poorly or not at all to these three mutants indicating that the bases in the CACCC motif were essential for high levels of binding (Fig. 6B). Moreover, none of the three mutant oligos was able to compete efficiently for binding to the wildtype CACCC-binding site (data not shown).
We also wished to test the ability of the EZF protein to affect transcription of linked genes. Therefore, we fused various segments of the EZF protein in-frame with the DNA-binding domain of the yeast transcription factor Gal4 under the control of the SV40 promoter and enhancer and transfected them into HeLa cells along with a reporter gene containing five Gal4binding sites. The potent acidic activation domain of VP16 fused to the Gal4 DNA-binding domain was transfected as a positive control. Fig. 7 shows that the proline/serine-rich region of EZF (EZF 1-360) provided a strong transcriptional activation function (about 40% of the level provided by Gal4-VP16). Neither the full-length protein (EZF 1-474) nor the zinc-finger region of EZF (EZF 363-474) could activate transcription from the same reporter when fused to Gal4. It may be that the full-length protein lacked activity because it had a higher affinity for endogenous target sites in HeLa chromatin than it did for the Gal4-binding sites in the reporter gene. Alternatively, the zinc-finger region of EZF might have masked its activation domain or contained additional domains that repressed the activity of the proline-rich region.
Mapping of the Ezf Gene-To determine the chromosomal location of the Ezf gene, we analyzed a panel of DNA samples from an interspecific cross for which over 900 genetic markers throughout the mouse genome have been characterized. The genetic markers included in this map span 50 and 80 centimorgan on each mouse autosome and the X chromosome. Initially, DNA from the two parental mice, C3H/HeJ-gld and (C3H/HeJgld ϫ Mus spretus)F1, were digested with various restriction endonucleases and hybridized with the EZF cDNA probe to identify restriction fragment length variants for use in haplotype analyses. Informative TaqI restriction fragment length variants were detected: C3H/HeJ-gld, 2.3 kb; Mus spretus, 2.4 kb.
Comparison of the haplotype distribution of the Ezf restriction fragment length variants indicated that this gene cosegregated in all 114 meiotic events examined, with the thioredoxin (Txn) gene locus on mouse chromosome 4 (Fig. 8). The haplotype distribution of the genes localized to mouse chromosome 4 is shown in Fig. 8. The best gene order (25) Ϯ the standard deviation (24) indicated the following gene order: (centromere) Tsha/Gabrr Ϫ 6.1 Ϯ 2.2 centimorgan Ϫ Ezf/Txn Ϫ 16.7 Ϯ 3.5 centimorgan Ϫ Jun. Tsha is the ␣ chain of chorionic gonadotropin, Gabrr symbolizes the linked genes for ␥-aminobutyric acid (GABA) receptor subunit rho1 and rho2, Txn is thioredoxin, and Jun is c-Jun. These reference loci have been previously mapped (26 -28). DISCUSSION In this study we have described EZF, a novel member of the TFIIIA class of zinc-finger proteins, that has three carboxylterminal zinc fingers with a high degree of homology to those of the tissue-specific transcription factors lung krü ppel-like factor, EKLF, and BTEB2. The amino-terminal portion of this protein is rich in proline and serine residues which is typical of certain transcriptional activation domains. We showed that a recombinant bacterially synthesized protein containing the zinc-finger motif of the EZF protein fused to GST is capable of binding to several G/C-rich binding sites including a CACCC motif which is a target for the closely related transcription factor EKLF. Moreover, mutations in the CACCC motif which disrupt its interaction with EKLF also disrupt the binding of EZF indicating that the two proteins recognize very similar DNA sequences. The proline/serine-rich region of EZF fused to a heterologous DNA-binding domain was able to function as a strong transcriptional activator having approximately 40% of the activity of the potent VP16 acidic activation domain. Together these data suggest that the EZF protein binds to a subset of G/C-rich sites similar to those recognized by EKLF and Sp1 and activates transcription through an amino-terminal proline/serine rich domain.
Analysis of the expression pattern of the EZF mRNA showed that it was apparently first expressed at E11.5 in mesenchymal cells of the nasal prominence and first branchial arch. A little later expression was detected in mesenchymal cells surrounding the cartilaginous primordia of the skeleton as well as in the mesenchymal cells surrounding the metanephric kidney. Ex- pression in these mesenchymal cell populations was detected at E13.5 and E15.5 as well but was almost undetectable in newborn mice. The mRNA for a related zinc-finger protein, WT-1, is also expressed during development of the kidney. However, WT-1 is expressed weakly in the mesenchymal blastema surrounding the kidney but is strongly expressed in differentiated kidney cells (3) a pattern opposite to that which we detected for EZF.
At E12.5 expression of the EZF mRNA was detected in epithelial cells of the dorsal surface of the tongue. Other epithelial cells began to express the EZF mRNA at 15.5 days of development. In newborn mice there was widespread high levels of expression in many epithelial cells such as those in the epider-mis, vibrissae, tongue, palate, oral mucosa, esophagus, stomach, colon, and lung. There was, however, little or no expression in epithelial cells of the small intestine or bladder. Nor did we detect expression in any cells in the heart or kidney of newborn mice (data not shown). In the epidermis expression of the EZF mRNA seemed to be largely confined to differentiating The hybridization signal (black grains) is strong in the living suprabasal layers of cells of the epidermis but there is little or no signal in the non-living cornified outer envelope (arrowhead), the mitotically active basal layer (arrow) or the underlying dermis. There is also little signal associated with a hair follicle (brown pigment is melanin).  6. Binding site analysis. A, the zinc finger domain of the EZF protein was fused in-frame to GST and synthesized in Escherichia coli. The purified recombinant protein was used in gel-shift analysis with oligonucleotides containing binding sites for the various zinc-finger proteins as indicated in Table I. B, recombinant protein binds well to the wild-type EKLF sequence but not to three mutant sequences. The mapping of the reference loci in this interspecific cross (Tsha, Gabrr, Txn, and Jun) has been previously described (26 -28). Mapping data for the EZF gene (symbol Zie) have been deposited in the Mouse Genome Database (accession number MGD-CREX-711). cells and was not found in the basal cell layer which was mitotically active suggesting the possibility that EZF plays a role in differentiation of epithelial cells.
Chromosomal mapping experiments showed that the mouse Ezf gene is located on mouse chromosome 4 in close proximity to the thioredoxin (Txn) gene. A mouse mutation called crinkly tail or cy which results in defects in the caudal vertebrae has been mapped to the same region of chromosome 4 (37). Although the EZF mRNA is transiently expressed in mesenchymal cells of the caudal vertebrae during embryonic development, it is unlikely that a mutation in the Ezf gene is responsible for the cy defect since abnormalities are seen in the tail of cy mice prior to the time at which the Ezf gene is expressed in that organ.
The human EZF gene maps 2 to chromosome 9q22.3-31 a region which also contains the human thioredoxin (TXN) gene (38). Several human genetic diseases have been linked to that region of the genome including two syndromes involving skin tumors, basal cell nevus syndrome, and self-healing squamous epitheliomata (39,40). It has been suggested that these two syndromes are allelic since they are both characterized by skin tumors and map to similar regions of the genome (39). However, there are differences in the types of tumors which arise (basal cell carcinomas versus squamous cell carcinomas) as well as in other manifestations of these syndromes. There are also reports of loss of heterozygosity on chromosome 9q in sporadic basal and squamous cell carcinomas (39,(41)(42)(43).
Basal cell nevus syndrome and sporadic basal cell carcinomas have recently been linked to mutations in a transmembrane protein patched involved in the hedgehog signaling pathway (44,45). To date, however, mutations in patched have not been implicated in self-healing squamous epitheliomata or sporadic squamous cell cancers. It is possible that genes other than patched located in the 9q22.3-q31 region could function as tumor suppressors in squamous cell cancers. Since the Ezf gene maps to the correct segment of the genome and is expressed highly in epithelial cells of tissues in which squamous cell cancers arise, it may be involved in the development of these tumors. Further studies will be required to address this question.