Structure and regulation of the envoplakin gene.

Envoplakin, a member of the plakin family of proteins, is a component of desmosomes and the epidermal cornified envelope. To understand how envoplakin expression is regulated, we have analyzed the structure of the mouse envoplakin gene and characterized the promoters of both the human and mouse genes. The mouse gene consists of 22 exons and maps to chromosome 11E1, syntenic to the location of the human gene on 17q25. The exon-intron structure of the mouse envoplakin gene is common to all members of the plakin family: the N-terminal protein domain is encoded by 21 small exons, and the central rod domain and the C-terminal globular domain are coded by a single large exon. The C terminus shows the highest sequence conservation between mouse and human envoplakins and between envoplakin and the other family members. The N terminus is also conserved, with sequence homology extending to Drosophila Kakapo. A region between nucleotides -101 and 288 was necessary for promoter activity in transiently transfected primary keratinocytes. This region is highly conserved between the human and mouse genes and contains at least two different positively acting elements identified by site-directed mutagenesis and electrophoretic mobility shift assays. Mutation of a GC box binding Sp1 and Sp3 proteins or a combined E box and Krüppel-like element interacting with unidentified nuclear proteins virtually abolished promoter activity. 600 base pairs of the mouse upstream sequence was sufficient to drive expression of a beta-galactosidase reporter gene in the suprabasal layers of epidermis, esophagus, and forestomach of transgenic mice. Thus, we have identified a regulatory region in the envoplakin gene that can account for the expression pattern of the endogenous protein in stratified squamous epithelia.

Plakins are cytoskeletal linker proteins mediating the association of intermediate filaments with cell-cell and cell-extracellular matrix interaction sites (1,2). The common domain structure of the plakins reflects this function. The N-terminal globular domain directs the proteins to membrane localization sites, desmosomes, or hemidesmosomes. The central rod domain forms a coiled coil structure and mediates the assembly of plakins into homodimers and, putatively, higher order structures. Finally, the C-terminal globular domain binds intermediate filament bundles (reviewed in Refs. [1][2][3]. At present five members of the plakin family of proteins are well characterized. Desmoplakin (4) is an abundant component of the desmosome inner plaque that binds keratin filaments to epithelial cell-cell attachment sites (2,5). Plectin (6, 7) is a ubiquitously expressed protein that is able to bridge intermediate filaments to microtubules and the actin cytoskeleton (8) and is found in desmosomes, hemidesmosomes, and adherens junctions (3). In epithelia BPAG1 1 (9) is also part of the hemidesmosome plaque (3,5,10), whereas the splice variants of BPAG1 found in neurons bind not only neurofilaments but also actin filaments and microtubules (11,12).
The two newest members of the plakin family are envoplakin and periplakin (13)(14)(15), which were originally identified as components of the epidermal cornified envelope, a submembranous layer of transglutaminase cross-linked protein that contributes to the barrier properties of the outermost layers of the skin (16). Envoplakin and periplakin are also found in desmosomes, and it has been proposed that they act as an interdesmosomal scaffold on which the cornified envelope is assembled (13,14,17). A further contribution of envoplakin and periplakin to epidermal barrier function is the covalent attachment of ceramide lipids to these proteins (17). In addition to expression in epidermis, envoplakin and periplakin are found in other stratified squamous epithelia and in two-layered and transitional epithelia such as mammary gland and bladder (14). Although envoplakin and periplakin share the characteristic plakin domain structure, their C-terminal domains are considerably smaller than the respective domains of the other plakins, and they are unique among the plakins in having the potential to heterodimerize with each other (14).
Gene targeting of plakins in mice and characterization of certain human pathologies have underlined the importance of this protein family. Desmoplakin is crucial for the assembly or stability of desmosomes and mice without desmoplakin die at day 6.5 of embryonic development (18). Desmoplakin is haploinsufficient in man, a heterozygous null allele causing a striated palmoplantar hyperkeratosis (19). In mice lack of either plectin or BPAG1 is not embryonically lethal but causes epidermal blistering as a result of the dissociation of keratin bundles from hemidesmosomes (11,20); in addition, the BPAG1 null animals have severe neuronal degeneration (11). Humans with autoantibodies against BPAG1 or mutations resulting in loss of plectin also have skin blistering (21,22). All the plakins, including envoplakin and periplakin, are targets for autoantibodies in paraneoplastic pemphigus, a skin and mucosal blistering disease that develops in some patients with lymphatic malignancies (23,24).
So far, knowledge about the regulation of plakin genes is limited. None of the promoters has been analyzed for tissuespecific regulatory regions in vivo, and only the BPAG1 and periplakin promoters have been characterized by reporter gene transfection in keratinocytes (25,26). To facilitate the analysis of envoplakin function, we have cloned the mouse envoplakin gene and analyzed the envoplakin promoter in cultured keratinocytes and the epidermis of transgenic mice.

EXPERIMENTAL PROCEDURES
Genomic Cloning and Sequencing-Three -phage clones were isolated and purified from a 129/Sv mouse genomic library using the p210-23 human envoplakin cDNA clone (13) as a probe. Restriction mapping and Southern blotting with several different envoplakin cDNA fragments were carried out to characterize the clones. Overlapping restriction fragments were subcloned into pBluescript KS II for sequencing. DNA sequencing was performed with fluorescent dye-labeled terminators and AmpliTaq thermostable DNA polymerase (Amersham Pharmacia Biotech), and the reactions were run in an ABI automatic sequencer.
Sequence Analysis-Sequences were assembled in the MacVector program. Comparison with the human cDNA sequence was made in MacVector and by using the GAP algorithm in the Genetics Computing Group package. The conceptual translation product of mouse envoplakin was compared with data bases by Blast, ScanPS, Prosite, and Pfam searches (in the Genomic Computing laboratory server in Imperial Cancer Research Fund and in the ProteinPredict servers at the European Molecular Biology Laboratory). Multiple sequence alignment was carried out with the ClustalX program.
Chromosomal Localization-DNA isolated from the phage M210-52 was labeled with biotin-14-ATP using a Bionick kit (Life Technologies, Inc.) and hybridized to metaphase chromosome spreads of normal mouse spleen cells. The hybridized chromosomes were stained with chromosome paints (Cambio) for identification, and the slides were counterstained with diamino-2-phenyl-indole dihydrochloride (Sigma) and mounted in Citifluor (Citifluor Ltd.). Separate images of the probe signal, banding pattern, and the counterstain were pseudocolored and merged using Smartcapture software (Digital Scientific).
Reporter Gene Constructs -The human envoplakin promoter was PCR amplified and cloned directly into the HindIII-BglII site of luciferase reporter gene plasmid pGL3 (Promega). The 3Ј end primer, 5Ј-CCCCAAGCTTCGCTCCTCACTGGCTGGTCA (cloning site underlined), is located in the 5Ј-untranslated region starting position ϩ21 (start of the cDNA: zero). Four different 5Ј end primers were used: 5Ј-GATCAGATCTGGTACCCAGTGTGAGGAAAAG for the generation of the plasmid p-1068ELuc, 5Ј-GGGAAGATCTGGAGGCTGAGGCAG-GAGAAT for the plasmid p-670ELuc, 5Ј-GGGAAGATCTAAACCTTCT-GTGGGAGTCGG for the plasmid p-220Eluc, and 5Ј-GGGAAGATCTA-GACTGGTTGTGCAGGAGGA for the plasmid p-158ELuc. A SacI digestion was used to make the plasmid p-363ELuc from the plasmid p-670ELuc, a SmaI-MscI digestion was used to make the plasmid p-288ELuc, and a SmaI-StuI double digestion was used to yield the plasmid p-101ELuc. The plasmid p-1068⌬SS Eluc lacks the SacI-StuI fragment (from Ϫ363 to Ϫ101). To amplify the mouse envoplakin promoter the primers 5Ј-GGGAAGATCTCAGATTTGAGAGCAGTTATGG and 5Ј-CCCCAAAGCTTCGCGTCCTCGCTGGCTACTAG were used for the 5Ј and 3Ј ends, respectively. The mouse promoter was then digested with HindIII, and the resultant 546-bp fragment was ligated to the HindIII site in the pGL3 polylinker.
Site-directed Mutagenesis-Putative transcription factor binding sites were mutated in the plasmid p-363 Eluc using a GeneEditor kit (Promega). The following oligonucleotide primers were used: 5Ј-CCT-TCCCTATCTGGATCCGATCGCCGCTGCG, to change a prospective Krü ppel-like factor (Klf) binding site at bp Ϫ265 in the promoter to yield plasmid p-363 Mut Klf; 5Ј-CCCCTAGGCATGTACATGTAACAAGTC-CAAC, to mutate overlapping E box and Klf sites at Ϫ240 to generate plasmid p-363 Mut EϩK; and 5Ј-GGGCAGGCTCGGCCATGGCCT-CAGGGCTGTGC, to mutate an Sp1 site at Ϫ190 to yield plasmid p-363 Mut Sp. The mutations were confirmed by restriction enzyme analysis and sequencing. The three mutations create a new BamHI, BsrG1, and NcoI site, respectively.
Promoter Analysis by Transient Transfections-Stocks of human primary keratinocytes (kc, km, and kq) were cultured in FAD medium supplemented with 10% fetal calf serum, 0.5 g/ml hydrocortisone, 5 g/ml insulin, 10 Ϫ10 M cholera toxin, and 10 ng/ml epidermal growth factor with a mitomycin C-treated 3T3-J2 feeder layer (27). For transient transfections the cells were grown in serum-free keratinocyte medium (KBM-2, Clonetics) and used at passages 2-6. Cells were adapted to serum-free conditions for at least one passage before transfection.
For transient transfections 1.2-2 ϫ 10 5 cells/well were plated in 6-well plates. The next day a total of 2.5 g of plasmid DNA consisting of 2 g of the luciferase construct and 0.5 g of ␤-galactosidase reference plasmid/well was transfected using Superfect reagent (Qiagen). After 3 h the cells were washed with phosphate-buffered saline and fed with fresh medium. Cells were harvested 16 -72 h after transfection, and the luciferase and ␤-galactosidase activities were measured in total cell extracts using the Luciferase Assay System (Promega) and Galacto-Light Plus assay (Tropix), respectively. Luciferase activities were standardized using the ␤-galactosidase activity of the same extracts.
Electrophoretic Mobility Shift Assays-Human and mouse primary keratinocytes were cultured as described above except that mouse keratinocytes were grown on collagen-coated dishes (Biocoat, Becton Dickinson) rather than on a feeder layer, and nuclear extracts were isolated from confluent cultures as described (28). Binding of the nuclear extracts to end-labeled double-stranded oligonucleotides was performed as described in (28). Top strand sequences for the binding sites were as follows: Klf site, 5Ј-CTATCTGGGTGTGATCGCCG; E box ϩ Klf site, 5Ј-TTGTTACACCCCACATGCCTAG; and Sp1 site, 5Ј-GGCTCGGCCC-CGCCCTCAGGG. The following antibodies (all from Santa Cruz) were used to supershift or prevent formation of protein DNA complexes: M-19 for gut-enriched Klf4, C-20 for upstream stimulatory factor-1, N-262 for c-Myc, (D-20)-G for Sp3, and 1C6 for Sp1. 0.1-2 g of the antibodies were incubated with the nuclear extracts 10 min before addition of the labeled probe as described (28,29).
Generation and Analysis of Transgenic Mice-The nucleotide sequence around the translation start site of the mouse envoplakin gene was mutated by PCR to a NcoI site (from CCATGT to CCATGG; start codon underlined) to facilitate cloning into the pPSD vector that carries the ␤-galactosidase gene (30). The plasmid p06EP-LacZ contains the promoter sequences up to a HindIII site at 608 bp upstream from the ATG site. To generate p34EP-LacZ, a 2.8-kb HindIII fragment was subcloned from the clone M210-52 to the HindIII site of linearized p06EP-LacZ. SalI digestion was used for both plasmids to release the insert for microinjection into recipient oocytes, yielding six founders for p34EP-LacZ and seven founders for p06EP-LacZ. DNA-positive mice were screened by PCR using primers 5Ј-CAGAGACAGCGCACCTG-CAGGGA and 5Ј-GATGGGCGCATCGTAACCGTCA derived from the mouse envoplakin promoter and ␤ -galactosidase gene, respectively.
Mouse genomic DNA isolated from tail snips of PCR-positive animals was digested with PstI or XhoI to evaluate transgene integration and copy number. For Southern blotting an internal 0.4-kb EcoRI fragment from the LacZ gene was used. After screening all the transgene-positive founders for ␤-galactosidase activity in tail sections, two founders per construct were selected to establish lines. Lines 7782 and 7783 expressing 34EPLacZ both had a transgene copy number of at least 5. Lines 7494 and 7495 expressing 06EP LacZ both had a transgene copy number of at least 20. Histochemical staining for ␤-galactosidase activity was performed as described (31). Representative photographs were scanned into Photoshop 5.0, where the brightness and contrast were adjusted equally for each panel shown.

The Mouse Envoplakin Gene Consists of 22 Exons and Maps
to Chromosome 11E1-We cloned the mouse envoplakin gene from a 129/Sv mouse genomic -library. Three overlapping clones ( Fig. 1) were isolated and purified by using the 5Ј end of the human envoplakin cDNA as a probe. Sequencing of these clones revealed potential exons that could code for mouse envoplakin. We compared these exons with the human envoplakin cDNA sequence (13) and with the exon-intron structure of the human envoplakin gene (32). Finally, we performed reverse transcription-PCR on mouse keratinocytes to confirm the predicted size of the envoplakin transcript and sequenced several mouse envoplakin N-terminal cDNA fragments (not shown). By these approaches, we determined the exon-intron structure of the entire N terminus and the beginning of the central rod domain of the mouse envoplakin gene.
The 3Ј end of the gene was not included in the isolated -clones. Based on the structure of the human gene we assumed that it would encode the rest of the rod and the Cterminal domain and would comprise a single exon. We used sequence information from mouse expressed sequence tag clones that showed highest similarity to the human envoplakin cDNA (GenBank TM accession numbers AA726169, AA727101, and AA798910) and performed genomic PCR to isolate a 3-kb fragment that, as predicted, encoded the missing part of the gene.
The mouse envoplakin gene consists of 22 exons (Fig. 1). The first 21 exons, corresponding to the large N-terminal protein domain, are small (39 -194 bp in length). The sizes of these exons are perfectly conserved between the human and mouse genes. In general, the pattern of exon sizes in the envoplakin gene resembles human periplakin more than the other plakin genes: 13 out of 21 N-terminal exons are identical in size in envoplakin and periplakin. Although small N-terminal exons characterize all the known plakin genes, the sizes of the exons are not conserved among family members, except for a few exons near the end of the N-terminal head domain of the proteins (not shown).
The clone M210-52 was used to determine the chromosomal localization of the mouse envoplakin gene by fluorescence in situ hybridization. The Evpl locus was present as a single copy residing on chromosome 11E1 in 20 metaphases analyzed on chromosome spreads of normal mouse spleen cells (Fig. 2). This region is syntenic to the human chromosomal band 17q25, where the human envoplakin gene has been localized (33).

Conservation of Coding Sequences between Mouse and Human Envoplakin and Other Plakin Family Members-
The coding sequence of the mouse envoplakin gene that we compiled predicts a polypeptide of 2035 amino acid residues (Fig. 3). This is one amino acid longer than the corresponding human protein owing to an additional residue at the C terminus before the stop codon. Mouse envoplakin is characterized by the common plakin structure: N-terminal and C-terminal domains are separated by an 828-amino acid rod domain. The boundaries of the rod domain were determined by the CoilScan program in the GCG package that predicts regions with a high probability of forming coiled coil structures (not shown). The same boundaries were seen when using the Coils algorithm in the Predict-Protein server. The boundaries for the central rod domain of mouse envoplakin were predicted to differ from the human envoplakin protein (13). This reflects the more interrupted structure of the envoplakin rod when compared with predicted rod domains of the other plakins, and the actual borders for the protein domains must await experimental verification.
Comparison of the mouse envoplakin sequence with human envoplakin and other plakins is shown in Table I. Mouse and human envoplakins are highly conserved. The N-terminal globular domain of the protein is similar to other plakin N termini including Kakapo, a Drosophila protein carrying domains homologous to plectin and dystrophin (34,35). The most conserved N-terminal sequence within the whole family is shown in Fig. 4A; it includes a tyrosine residue (amino acid 210) followed by closely spaced leucines and ending with a DWSD motif. Notably this structure is present in both Drosophila and Caenorhabditis elegans Kakapo proteins (GenBank TM accession number for the cosmid that contains the coding sequence is ZK1151) and its mouse homologue, the actin cross-linking protein ACF-7 (mACF-7; Ref. 36). Interestingly, the five times repeated KGS motif in the beginning of envoplakin (Fig. 3) is conserved between mouse and human but is not found in the other plakins.
The newly defined linker domain between the rod and the C-terminal globular domains (14,15) has the highest sequence conservation between human and mouse envoplakin ( Table I). The alignment of the linker domains in different plakins (Fig.  4B) indicates the conserved and similar residues between mouse envoplakin and the other plakin family members. The linker domain is lacking in ACF-7 and in Kakapo, the C terminus of which is not homologous to plakins but to dystrophin (34 -36).
Conservation of the Human and Mouse Upstream Sequences Indicates Potential Regulatory Regions-The nucleotide sequence of the putative human envoplakin promoter was determined from the cosmid ICRFc105D03119 (33). Comparison of the corresponding mouse sequence revealed a considerable degree of sequence homology (Fig. 5). Both the mouse and human sequences lacked a TATA box and had instead an initiator element consensus sequence only. Conservation was highest around and just upstream from the initiator consensus sequence and in a stretch of about 150 bp starting at nucleotide Ϫ137 of the human sequence (Fig. 5). The human promoter has an Alu repetitive element further upstream from the conserved region (from bp Ϫ550). The relative locations and sequences of several putative binding sites for transcription factors were well conserved between species. These include several GC and GT boxes for the Sp1 family of transcription factors, E boxes for helix-loop-helix factors, and two binding sites for Krü ppel-like transcription factors.

Two Conserved DNA Motifs Are Necessary for High Level Reporter Gene Expression in Primary Human Keratinocytes-
The high degree of conservation between the human and mouse envoplakin upstream sequences suggested that important regulatory motifs might lie within the first few hundred base pairs of that region. To test this hypothesis we assayed promoter activity in reporter gene transfections. Both human and mouse upstream sequences were cloned into luciferase reporter gene constructs. In addition, we constructed a deletion series of the potential human promoter. These plasmids were transiently transfected into human primary epidermal keratinocytes. Each plasmid was tested in at least five independent transfections.
The longest fragment of the human upstream sequence tested, extending over 1 kb from the 5Ј end of the gene, consistently gave the highest luciferase activities (Fig. 6). These values were on average 500 times higher than obtained by transfection of the empty vector alone, which indicates that the fragment contains sequences capable of high promoter (or combined promoter and enhancer) activity. The mean activity of this construct (p-1068ELuc) was designated as 100%, and the mean activities of the other constructs were calculated relative to that. The mouse promoter (p-MEPLuc; up to the HindIII site at Ϫ523 of the mouse sequence) was on average slightly more active than the longest tested human fragment (Fig. 6), possibly reflecting the presence of an Alu repetitive sequence in the human promoter upstream from the highly conserved region. The shortest fragment of the human promoter tested (p-101ELuc) was only about five times more active than the empty vector alone (Fig. 6).
In comparing the activity of a series of fragments of the human promoter intermediate in length between p-1068 and p-101, the most remarkable difference was between the shortest promoter and a construct extending to nucleotide Ϫ363 (Fig. 6). To test the importance of this region, we deleted it from the full-length promoter. The resultant plasmid, p-1068⌬SS, had as little promoter activity as the shortest (p-101) plasmid. Thus, deletion of the 260-bp fragment rendered the envoplakin promoter inactive. This region includes that part of the upstream sequence that is most highly conserved between human and mouse genes (Fig. 5). Furthermore, even though the more distal sequences (from Ϫ363 to Ϫ1068) seemingly contain additional positive regulatory elements, the activity of these elements was dependent on the presence of the 260-bp fragment. The activity of the 260-bp fragment could be divided into two additive elements: transfection of the construct p-220ELuc resulted in approximately half the activity of the construct p-363ELuc (Fig. 6).
To further characterize the 260-bp fragment, we mutated three conserved elements harboring putative transcription factor binding sites. The mutations were designed to abolish the consensus binding sites for a Klf site at nucleotide Ϫ265 (p-Mut Klf), an overlapping E box and Klf site at Ϫ240 (p-Mut EϩKlf), and an Sp1 site at Ϫ190 (p-Mut Sp1) (Fig. 5). The mutations were compared with the wild type promoter for activity in luciferase reporter gene assays (Fig. 7). The first mutation did not significantly change promoter activity, and an additional deletion construct (p-288 Eluc) confirmed that the two basic helix-loop-helix protein binding sites upstream from Ϫ265 Klf site were not needed for high level activity, which further narrowed the critical region to 187 bp between Ϫ101 and Ϫ288. Mutations p-Mut EϩKlf and p-Mut Sp1 very effectively reduced promoter activity, each causing a greater than 20-fold decrease in luciferase activity compared with wild type (Fig. 7). This indicated that an Sp1 site and an element containing overlapping E box and Klf sequences were both necessary for the function of the fragment.
To study transcription factor binding to these sites, we employed electrophoretic mobility shift assays (Fig. 8). The Sp1

TABLE I Similarity (as a percentage) between mouse envoplakin protein domains and other plakins and plakin-related proteins
Mouse envoplakin protein domains are indicated in Fig. 3. GAP program in the Genetics Computing Group suite was used to produce an optimal alignment between the mouse envoplakin domains and corresponding protein sequences of other plakins. site was found to interact with Sp1 protein family members in both human (not shown) and mouse keratinocyte nuclear extracts (Fig. 8A). Specific complexes could be disrupted by preincubating the nuclear extracts with antibodies against Sp1 or Sp3, whereas a control antibody (c-Fos) did not prevent complex formation (Fig. 8A). None of the antibodies we tested affected nuclear protein binding to the combined E box and Klf site at Ϫ240, indicating that in keratinocytes this site is not occupied by the basic helix-loop-helix transcription factor upstream stimulatory factor-1, by c-Myc, or by any of the Krü ppel family members recognized by an antibody against Klf4 (Fig. 8B).

bp of the Mouse Envoplakin Upstream Sequence Can Direct ␤-Galactosidase Expression to Differentiated Keratinocytes in Transgenic
Mice-We analyzed the tissue specificity of the envoplakin upstream sequences by using LacZ reporter gene constructs in transgenic mice. Two different versions of promoter-LacZ minigenes carrying either 3.4 or 0.6 kb of the mouse envoplakin promoter were constructed. Microinjection of these constructs yielded six and seven founder mice, respectively. LacZ activity was analyzed in tail skin of the founders. For both constructs four founders stained positively for ␤-galactosidase, and staining was restricted to the suprabasal layers of the epidermis (not shown). For both constructs, two independent lines presenting the strongest staining were bred for detailed analysis.
Histochemical staining for ␤ -galactosidase activity in adult back skin revealed that both of the constructs were able to direct reporter gene expression in the differentiated cell layers of the epidermis (Fig. 9, A and B). No staining was detectable in the dermis or in the basal layer of the epidermis. In hair follicles staining was confined to the inner root sheath. Both promoters were active in the esophagus and forestomach (Fig.  9, C and D). No staining was observed in the epithelium of either bladder or mammary gland (Fig. 9, E and F), even though these epithelia are known to express envoplakin (14). In negative littermates, no ␤-galactosidase activity was detected in the skin, esophagus, or forestomach (data not shown).

DISCUSSION
Keratinocytes have distinct features that increase the structural integrity and mechanical strength of the epidermis. Desmosomes connect keratin intermediate filaments to a dynamic network extending throughout the tissue (2,8). Cornified envelopes, the functional end point of keratinocyte terminal differentiation, form a rigid and protective protein barrier in the outer layers of the epidermis (for a recent review see Ref. 37). Envoplakin is a component of both desmosomes and cornified envelopes (13) and is thus a potentially important structural protein of the epidermis. In this report we present the genomic organization of mouse envoplakin and determine key regulatory elements in the human and mouse envoplakin promoters.
The mouse envoplakin gene, Evpl, lies on chromosome 11E1, syntenic to human chromosome 17q25, where the corresponding human gene, EVPL, resides (33). In both species, the envoplakin gene is proximal to the acidic keratin gene clusters that localize to 11D in mouse and 17q21-23 in man (38,39). There are several mouse mutations with skin phenotypes, such as bareskin (Bsk) and rex (Re), that segregate with chromosome 11 (40). One mouse mutation, Rim3, affecting skin and hair follicles, has been mapped more accurately to the distal region of chromosome 11 (41) and envoplakin can thus be considered as a potential candidate gene for this disorder. No human diseases have so far been linked to the envoplakin gene. A skin disorder, focal nonepidermolytic palmoplantar keratoderma with esophageal cancer, is mapped to 17q25, but high resolution mapping and sequencing have excluded envoplakin as a candidate gene (32).
The mouse envoplakin gene has 22 exons, like its human counterpart (32) and the human periplakin gene (26). As evaluated by the sizes of the exons and homology at the DNA level, FIG. 4. Alignment of conserved protein domains in the plakin family. The Clustal-X program was used for multiple sequence alignment of plakin protein domains. A, the first half of the N terminus contains a conserved sequence that is found in all plakins and in actin cross-linking factor-7. mEPL, mouse envoplakin N-terminal sequence from amino acid residue 204; hEPL, human envoplakin from amino acid 204; mPPL, mouse periplakin from amino acid 191; hPPL, human periplakin from amino acid 192; hDP, human desmoplakin from amino acid 249; hBPAG1, human bullous pemphigoid antigen 1 from amino acid 352; rPLEC, rat plectin from amino acid 190; mACF-7, mouse actin cross-linking factor 7 from amino acid 812; dKAKAPO, Drosophila Kakapo from amino acid 597. B, alignment of the linker sequence in the C terminus of the plakins. Mouse and human envoplakin are shown from amino acid 1684, human periplakin is shown from amino acid 1654, mouse periplakin is shown from amino acid 1655, human desmoplakin is shown from amino acid 2463, rat plectin is shown from amino acid 3731, and human bullous pemphigoid antigen-1 is shown from amino acid 2336. envoplakin and periplakin form a pair of closely related genes that are to some extent divergent from the other plakins. This is further emphasized by the fact that at the protein level mouse and human envoplakin are most similar to periplakin. The linker domain immediately C-terminal to the rod domain is the most conserved part of the plakin family (14,15,23), and some paraneoplastic pemphigus autoantibodies cross-react with C-terminal fusion proteins of envoplakin, periplakin, and desmoplakin (23). The function of this part of the plakin proteins has not been demonstrated conclusively but is likely to be involved, together with the C-terminal repeats, in the interaction of plakins with intermediate filaments (42,43). The presence of a potential protein kinase C phosphorylation site in the linker further suggests a regulatory role in protein interactions for this domain.
The putative promoters of both the human and mouse envoplakin genes lack a TATA box. As originally described for housekeeping genes, they have a possible initiator element preceded by several binding sites for transcription factors of the Sp1 family. It is of interest that the envoplakin upstream sequence shares several features with the periplakin promoter (26). They both are TATA-less, harbor several Sp-1 sites, and need sequences distal to the basal promoter for optimal expression in cultured keratinocytes. Because envoplakin and periplakin are usually co-expressed and possibly form heterodimers (14), it is conceivable that they are targets for the same signaling pathways and transcription factors.
Using transgenic mice we were able to show that only 600 bp of the mouse envoplakin promoter are needed for gene expression in suprabasal keratinocytes. Moreover, transient transfections into human primary keratinocytes demonstrated that a 187-bp element of the human upstream sequence, which is highly conserved in the mouse, is necessary for high level reporter gene expression. The 187-bp region harbors several fully conserved binding sites for transcription factors, such as Krü ppel-like factors and basic helix-loop-helix factors that interact with E box sequences. Notably, no conserved activator protein-1 sites were found in this region, although members of the Fos and Jun protein families are involved in the regulation of several promoters of keratinocyte-specific genes (43)(44)(45).
The importance of the conserved Sp1-binding site in the promoter activity of the envoplakin gene is interesting in the light of recent reports on the function of Sp1 in regulating other epithelial genes. Usually, Sp1 does not determine the tissue specificity of a gene on its own but acts co-operatively with other transcription factors such as Ets (in the human transglutaminase 3 promoter; Ref. 47) or AP-2 (in the keratin K3 promoter; Ref. 48). The Sp1 family comprises four proteins, one of which, Sp3, can antagonize the action of Sp1. It has been suggested that the ratio of Sp1 to Sp3 regulates the papilloma FIG. 5. Sequence comparison of human and mouse envoplakin promoters. A sequence comparison between the human (top) and mouse (bottom) envoplakin promoters was produced by the GAP program in the GCG package. ATG translation start site is in bold type. The beginning of the human envoplakin cDNA is in bold italics; this nucleotide was designated as ϩ1. The putative initiator element is underlined, and some consensus binding sites for transcription factors are boxed.
FIG. 6. Deletion analysis of envoplakin gene promoter activity in human keratinocytes. Envoplakin promoter-luciferase constructs were transiently transfected in human foreskin keratinocytes, and luciferase activity was measured 24 h after transfection. p-MEPLuc is the mouse promoter; all the other constructs are human. The raw luciferase values were normalized against co-transfected ␤-galactosidase reference plasmid. The values are presented as percentages of the most active human promoter fragment, which was assigned the value of 100%. Each bar shows the mean and standard error of five independent transfections (three transfections in the case of pMEPLuc). The positions of putative transcription factor binding sites are shown above the graph. virus 16 promoter during epithelial differentiation (49). Sp3 levels are higher in basal keratinocytes, and Sp1 is up-regulated in differentiating keratinocytes. These observations support the conclusion that Sp1 plays a role in activating differentiation-specific genes such as envoplakin (the present study), distal-less Dlx3 (50), and involucrin (51).
Klf4, a member of the Krü ppel family of transcription factors, has recently been shown to be crucial for the barrier function of the epidermis (52). Targeted inactivation of Klf4 leads to neonatal death because of transepidermal water loss and genes regulated by Klf4 in keratinocytes encode cornified envelope proteins (52). We were able to detect a weak binding of Klf proteins to the site at Ϫ265 (data not shown) that did not affect envoplakin promoter activity in cultured keratinocytes. In contrast, the element containing overlapping sites for basic helix-loop-helix proteins and Klf proteins that was essential for promoter activity did not interact with any of the candidate factors tested by electrophoretic mobility shift assays. These included upstream stimulatory factor-1, which has been shown FIG. 7. Site-directed mutagenesis of the envoplakin upstream sequence indicates two conserved elements necessary for promoter activity. Point mutations abolishing transcription factor consensus binding sites were introduced into plasmid p-363ELuc. p-Mut Klf removes a consensus Klf site, p-MutEϩKlf is at an combined E box and Klf site, and p-MutSp1 is at an Sp1 site. A deletion construct p-288Eluc removed the two most distal E box sequences in p-363ELuc. The constructs were transfected into primary human keratinocytes and luciferase activities were measured 24 h later. The luciferase values were normalized against a ␤-galactosidase reference plasmid. The means and standard errors of four independent transfections are shown.
FIG. 8. Electrophoretic mobility shift assays of the two DNA elements critical for envoplakin promoter activity. Nuclear extracts from primary mouse keratinocytes were incubated with radiolabeled double-stranded oligonucleotides corresponding to DNA elements critical for promoter activity. A, the Sp1 site (Mut Sp1 at bp Ϫ190). Specific complexes were competed by 100-fold excess of unlabeled probe but not by 100-fold excess of a nonspecific probe (N.S.). Preincubation of the nuclear extracts with antibodies against Sp1 or Sp3 perturbed complex formation while preincubation with a c-Fos antibody did not affect complex formation. B, the site containing overlapping basic helixloop-helix and Klf consensus motifs (at Ϫ240) bound specific proteins that were not competed by excess of a nonspecific probe (N.S.). Moreover, the complexes were not affected by antibodies against upstream stimulatory factor-1 (Usf), c-Myc, or Klf4. to be crucial for syndecan-1 enhancer activity in a keratinocyte cell line (29,53), c-Myc, which is known to regulate epidermal differentiation (54), and any of the several Klf family members recognized by polyclonal antibody M-19 against Klf4 (Santa-Cruz). Future work is needed to determine the exact nature of the transcription factors interacting with the MutE ϩ Klf site in the envoplakin promoter. It is possible that other Klf sites in the envoplakin promoter are used in later stages of epidermal differentiation than could be studied in our transient transfection assays.
Different regulatory regions appear to be responsible for envoplakin expression in stratified squamous epithelia compared with transitional or simple epithelia. Even the 3.4-kb envoplakin promoter failed to express ␤-galactosidase in mammary gland, in bladder, or in the simple epithelium of gastric mucosa. It is thus likely that other, as yet uncharacterized, regulatory regions exist in the gene. At least in the case of bladder urothelium there is evidence for highly tissuespecific regulatory regions because the promoter of the uroplakin II gene is not active in any other epithelium studied (55).
Epidermal gene expression has been the subject of considerable and continuing research (44,56). A number of promoters of genes up-regulated during keratinocyte terminal differentiation have been characterized previously. These include cornified envelope precursors such as involucrin (51,57), small proline-rich proteins (46), and loricrin (58,59) and desmosomal proteins such as desmoglein-1 (60). The regions needed for epidermal expression of these genes have been elucidated in many cases. Usually, a few kilobases of the promoter are sufficient for correct expression in skin. For example, a 2.5-kb fragment of the human transglutaminase-1 gene (61) and the so called distal regulatory region (from 1.95 to 2.5 kb upstream) of the human involucrin promoter (51) direct ␤-galactosidase expression correctly. Likewise, a 4.2-kb fragment of the desmoglein-1 promoter controls epidermal expression, even though it fails to act position-independently and is not sufficient for expression in other stratified epithelia (60). In the loricrin gene, on the contrary, far-upstream sequences between 6.5 and 14 kb are needed for correct expression (59). Interestingly, a short 90-bp promoter fragment of the keratin-5 gene misdirects ␤-galactosidase expression to suprabasal keratinocytes, even though longer constructs are correctly expressed in basal cells (62). Thus, gene expression can be activated in differentiated keratinocytes both by very proximal promoter elements and by more distal enhancer-like elements. Among those so far studied, the proximal promoter of envoplakin is one of the shortest that is correctly expressed in differentiated keratinocytes and might thus turn out to be useful in biotechnological or therapeutic applications where size constraints of the vectors often limit the range of tissue-specific promoters that can be tested.
Although to date analysis of the promoters of keratinocytespecific genes has tended to highlight their diversity rather than common elements, evidence is starting to emerge for "master switches" orchestrating the fate of different keratinocyte populations and the subsets of genes expressed in them. Notably, lack of p63 leads to almost total absence of the proliferative and differentiating cell compartments of the epidermis (63,64), and ectopic expression of a mammalian distal-less homologue (Dlx3) in the epidermal basal layer of transgenic mice leads to premature expression of several markers of terminal differentiation (65). As described above, Klf4 may coordinately regulate several cornified envelope genes during late stages of differentiation (52). The relative abundance of Sp1 and Sp3 is also emerging as an important determinant of the expression levels of differentiation-related genes. Combining in vivo experiments in transgenic mice with in vitro mapping of regulatory elements will help to elucidate the signaling pathways that control epidermal differentiation.