Epiplakin Gene Analysis in Mouse Reveals a Single Exon Encoding a 725-kDa Protein with Expression Restricted to Epithelial Tissues*

, Based on cDNA cloning and sequencing, human epiplakin has been classified as a member of the plakin protein family of cytolinkers. We report here the characterization of the mouse epiplakin gene locus and the isolation of full-length mouse epiplakin cDNA using BAC vectors. We found that the protein is encoded by a single remarkably large exon ( > 20 kb) that consists of a series of 0.8–1.5-kb-long DNA repeats, eight of which are virtually identical. Consequently, mouse epiplakin contains 16 plakin repeat domains, three more than reported for the human protein and eight more than predicted for the mouse protein based on the contig characterized by the Mouse Genome Sequencing Consortium. Using antibodies raised to a highly conserved repeating epiplakin sequence domain, we show that the protein in cells is expressed in its full length (725 kDa), and we provide evidence that the size of human epiplakin previously may have been underestimated. In addition we show on transcript and protein levels that epiplakin is restricted to epithelial tissues and that its gene maps to mouse chromosome 15 (human chromosome 8). This study lays the groundwork for future genetic approaches aimed at defining the

Epiplakin was originally identified as a 450-kDa epidermal autoantigen showing immunoreactivity with the serum of a patient suffering from a subepidermal blistering disease (1,2). The recent isolation and sequence analysis of epiplakin cDNA from humans (3) classified the protein as a member of the plakin or cytolinker protein family (for review see Refs. 4 and 5). This family comprises large multidomain proteins that serve as bridging elements between cytoskeletal filaments and as filament anchoring structures of membrane-associated adhesive junctions. Concurrent with their role as cytoskeletal linker proteins, the functional impairment of these proteins leads to diseases accompanied by skin blistering and other types of tissue fragility (6,7).
Desmoplakin, plectin, bullous pemphigoid antigen 1 (BPAG1), 1 microtubule-actin cross-linking factor, envoplakin, periplakin, and epiplakin have emerged to date as plakin family members (4,5). Variably these proteins share several structural features, such as an amino-terminal actin-binding domain consisting of a pair of calponin homology domains, a plakin domain, a coiled coil, or spectrin repeat-containing rod domain, a microtubule-binding domain, and plakin repeat domains (PRDs) (4). The number of PRDs found in their structure is a hallmark of each plakin. For example, plectin contains six such domains, desmoplakin has three, BPAG1e has two, envoplakin has one, and periplakin has zero. Epiplakin consisting entirely of PRDs, 13 according to Ref. 3, and lacking any of the other structural motifs characteristic of plakins must be considered an atypical family member. However, the redundance of its PRDs makes it a very attractive model for studying the specific function(s) of this domain and of plakin cytolinker proteins in general. PRDs comprise a highly conserved ϳ20-kDa core region, called a module (8,9), and a less conserved linker region of variable length. Modules are composed of four complete and one incomplete tandem copies of a 38-residue-long sequence motif, referred to in data bases as the plectin (PLEC) repeat (9). As predicted on the basis of computer modeling (9), x-ray crystallography showed each of the first four PLEC repeats to consist of a ␤-hairpin followed by two anti-parallel ␣-helices, whereas the last PLEC repeat forms a hairpin-helix-loop-hairpin-type motif (10). Although little is known about the function of these modules, the linkers connecting them are thought to serve as binding sites for interacting proteins. For example in plectin, the best studied plakin to date, the binding site for intermediate filaments (IFs) of various types (vimentin, cytokeratins, and desmin) has been mapped to a stretch of about 50 amino acid residues in the linker region between PRD modules 5 and 6 (11). For desmoplakin, too, it could be shown that residues downstream of its third PRD module were important for keratin binding, whereas the linker between the second and third module as well as the second PRD module itself have been reported to be required for vimentin interaction (12)(13)(14)(15) (note that the PRD modules of desmoplakin are also known as A, B, and C domains). Binding to keratins and neuronal IFs has also been demonstrated for the PRDs of BPAG1e and BPAG1n (16).
To lay a basis for future genetic studies, here we characterized the mouse epiplakin gene locus and determined the nucleotide sequence of the mouse epiplakin cDNA in full. We found that the entire coding sequence of mouse epiplakin is contained within a very large (ϳ20 kb) single exon encoding a series of virtually identical DNA repeats. Only by using BAC technology, but not standard DNA cloning methods, were we able to clone the gene and establish its nucleotide sequence. Our analysis shows that mouse epiplakin contains 16 PRDs, and its size is therefore considerably larger than that reported for the human ortholog. Because we could confirm a similar large size of mouse and human epiplakin by immunoblotting, it is possible that the size of human epiplakin in former studies has been underestimated. In addition, we show the immunolocalization of epiplakin on frozen sections of mouse tissues using a newly generated, remarkably potent antiserum, and we document the tissue distribution of epiplakin transcripts using RNase protection assays. The predicted structure of epiplakin is discussed in comparison with that of other plakin family members.

EXPERIMENTAL PROCEDURES
Isolation of cDNA and DNA Sequencing-cDNA clones were isolated from a mouse skin cDNA library (strain C57BL/6; Stratagene) using a 830-bp-long rat epiplakin cDNA fragment as probe. 5Ј-and 3Ј-rapid amplification of cDNA ends as well as PCR analyses of Marathon-Ready™ cDNA derived from mouse kidney and 11-day-old mouse embryo (Clontech, Palo Alto, CA) were performed using Advantage cDNA polymerase (Clontech) in a Perkin-Elmer GeneAmp 9700 thermal cycler, following the protocols supplied by the manufacturers. Nested epiplakin-specific primers were designed with a melting temperature higher than 60°C using the Oligo 4.0 program. Optimized PCR conditions for the first PCR consisted of five cycles of 94°C for 5 s and 72°C for 3 min, 5 cycles of 94°C for 5 s, and 70°C for 3 min and 30 cycles of 94°C for 5 s and 68°C for 3 min. 2 l of a 1:50 dilution of the initial PCR was used in a second round of PCR with nested primers (40 cycles of 94°C for 5 s, 64°C for 30 s, and 72°C for 3 min). PCR products were cloned into plasmid pCR2.1 (Invitrogen) for further analyses and sequencing. The nucleotide sequences were determined by the chain termination method using the DyeDeoxy terminator cycle sequencing kit (Applied Biosystems, Foster City, CA).
Isolation of Genomic Clones and Analysis of Exon-Intron Organization-To isolate genomic clones, a mouse genomic library (strain 129; Stratagene, La Jolla, CA) was screened using mouse epiplakin cDNA clone pDS 104 and a 900-bp SalI/XbaI fragment of lambda clone EP1 as probes. Exon-intron boundaries were identified by comparison of genomic DNA and cDNA sequences. The intron was sequenced in its entirety.
Data Base Search and Sequence Alignments-Data base searches were performed using the BLAST program (17). All of the sequence alignments were generated with the LALIGN program (www.ch. embnet.org/software/LALIGN_form.html) using the algorithm of Huang and Miller (18). Secondary structure predictions were made using the programs GOR IV (19), HNN (20), PSIPRED (21), and Jpred (22).
Chromosomal Gene Mapping-Genetic mapping was conducted using the interspecific backcross panel BSS ((C57BL/6JEi ϫ SPRET/ Ei)F1 ϫ SPRET/Ei) from the Jackson Laboratory (Bar Harbor, MN) (23). DNA (25 ng) of each panel was amplified by PCR (40 cycles of 94°C for 10 s, 60°C for 30 s, and 72°C for 2 min and a final extension at 72°C for 7 min) using a high fidelity polymerase mix (24) and primers located in the intron of epiplakin (forward primer: CTCCACTCCCAACCCA-GAGCAGGCCCG) and at the 5Ј end of exon 2 (reverse primer: GC-CCCCACTGAAACCAGCATCAAGAAG). Backcross progeny mice were typed by NruI polymorphism, a restriction site present in C56BL/6J but not Mus spretus. 10 l of each PCR were digested with NruI in a total volume of 20 l and analyzed by 1.8% agarose gel electrophoresis. The results were submitted to the Jackson Laboratory to be analyzed using the Map Manager program (25).
RNase Protection Assays-cDNA sequences used as probes were subcloned into pSP64 (Promega, Madison, WI) by PCR cloning using primers flanked with suitable restriction sites. RNA probes were transcribed from linearized plasmids in the presence of [␣-32 P]GTP (800 Ci/mmol; PerkinElmer Life Sciences) using SP6 RNA polymerase (Roche Applied Science) and subsequently purified on 8% polyacrylamide gels. All antisense RNAs used included a piece of vector sequence to ensure that the obtained bands corresponded exclusively to protected RNA and not to remnants of undigested probes. Total RNA was extracted from tissues according to Chomczynski and Sacchi (26) and quantified by absorbance at 260 nm; the 260/280 nm absorbance ratios were about 1.6. To perform the assay, 10 g of total RNA were coprecipitated with an excess of probe (50,000 cpm) and hybridized at 60°C for 16 h in 10 l of 80% formamide, 0.4 M NaCl, 40 mM PIPES, and 1 mM EDTA. To digest RNA, 350 l of 100 units/ml RNase T1 and 5 g/ml RNase A (both from Roche Diagnostics) in 10 mM Tris-HCl, pH 7.5, 300 mM NaCl, and 5 mM EDTA were added, and the mixtures were incubated for 1 h at 37°C. The reaction was stopped by adding 10 l of 20% SDS and 5 l of proteinase K (20 mg/ml; Merck) followed by incubation for 15 min at 37°C. Samples were extracted with phenol/chloroform, precipitated with ethanol, denatured at 90°C for 3 min, and subsequently analyzed on 8% polyacrylamide gels containing 7.7 M urea. Protected fragments were visualized by autoradiography at Ϫ78°C for 20 h up to 4 days. A probe specific for murine ribosomal protein S16 mRNA yielding a protected band of 90 nucleotides, served as an internal control, and was used in all samples. The signals were quantified using a PhosphorImager (Molecular Dynamics, Sunnyvale, CA), and the obtained relative intensities were normalized for the size and guanosine content of the fragments and the amount of RNA loaded as determined by the S16 signal.
Expression of Fusion Proteins in Bacteria-A mouse epiplakin EcoRI cDNA fragment coding for amino acids 6041-6364 was cloned into the bacterial fusion vectors pGEX 4T-1 (Amersham Biosciences) and pMal-c2 (New England Biolabs, Beverly, MA). GST and maltose binding protein fusion proteins were expressed in Escherichia coli BL21 (DE3). GST fusion proteins were solubilized by sonication in 50 mM Tris-HCl, pH 9.0, 2 mM EDTA, 1% Triton X-100 and purified over a glutathione-Sepharose affinity column. Maltose binding protein fusion proteins were brought into solution by sonication in 20 mM Tris-HCl, pH 7.4, 200 mM NaCl, 1 mM EGTA and purified using amylose affinity chromatography.
Preparation of Antibodies-GST epiplakin fusion protein (100 g) in 200 l of phosphate-buffered saline was mixed with 350 l of either Freund's complete (for initial subcutaneous injections) or incomplete (for six booster injections at intervals of 4 weeks) adjuvant (Sigma). To prepare serum, blood was incubated at 37°C for 1 h and subsequently centrifuged for 20 min at 4000 rpm. The antibodies were affinitypurified by column chromatography on maltose binding protein fusion proteins immobilized on Sepharose beads and stored frozen in 0.2 M Tris-HCl, pH 8.0, at a concentration equivalent to that in the serum.
Southern and Northern Blotting-Genomic and BAC DNA were isolated using standard procedures (27) and the large construct kit (Qiagen), respectively. Digested DNA was separated on 0.8% agarose gels with 25 V for 72 h at 4°C. DNA was transferred to a nylon membrane (Hybond Nϩ; Amersham Biosciences), the blots were hybridized with a ␣-32 P-labeled epiplakin fragment corresponding to one of its repeats, and signals were visualized by autoradiography (28). RNA was prepared from tissues using the RNeasy Protect Midi kit (Qiagen). Northern blots were prepared according to Qiagen News (48,49) and probed similarly to Southern blots.
For immunofluorescence microscopy, selected tissues were shockfrozen in isopentane, cooled with liquid N 2 , sectioned on a cryomicrotome, and fixed with acetone at Ϫ20°C. All of the sections (2 m) were incubated with antibodies to epiplakin at a dilution of 1:10000, followed by incubation with Texas Red-conjugated goat anti-rabbit IgG (Jackson Immunoresearch Laboratories).

RESULTS
The Mouse Epiplakin Gene: Eight of Sixteen Sequence Repeats Are Nearly Identical-When we screened a rat genomic library with a rat plectin cDNA probe, covering the rod domain and one and a half plectin modules, two overlapping lambda clones were isolated that carried sequences differing from plectin. 2 Analysis of these clones showed that parts of them were highly homologous to published partial sequences of a human epidermal 450-kDa autoantigen (2), later identified as epiplakin (3). To isolate the gene and entire cDNA, a mouse skin cDNA library was screened using a PCR-amplified part of the isolated rat sequence as a probe. The screening yielded a 2420-bp cDNA clone (pDS 104), which contained a stop codon and a poly(A ϩ ) signal ( Fig. 1). By screening a genomic mouse lambda library with this cDNA, two clones (EP1 and EP10) were obtained, and by gene walking an additional clone (EP4) was obtained, together spanning a total of 22 kb (Fig. 1). Subcloning and sequence analysis of clone EP1 revealed a (ϳ10 kb) large ORF, which was transcribed in its entirety, as shown using a series of overlapping PCRs on mouse cDNA libraries from kidney and 17-day-old embryos (data not shown). 5Ј-Rapid amplification of cDNA ends using marathon 17-day-old embryo mouse cDNA yielded 1087 bp of cDNA upstream of the 10-kb ORF. Southern blot and sequence analyses showed that this sequence corresponded to a noncoding novel exon (exon 1), separated from exon 2 by an intron of 6384 bp (Fig. 1). A putative start ATG at the 5Ј end of exon 2 fulfilled the Kozak consensus criteria (33) for start codons at positions Ϫ3 and ϩ4. Clone EP10, which was partially overlapping with EP1 ( Fig. 1), turned out to be unstably replicating in E. coli hosts, because recombination events generated lambda inserts differing in size by about 1.5 kb (data not shown). This rendered clone EP10 unfit for any further analysis.
To isolate and characterize the genomic region 3Ј of clone EP1, we used a BAC clone shown by BLAST search to harbor epiplakin genomic sequences (RPCI23, clone 208H22). The characterization of this clone by restriction enzyme digestion and Southern blotting enabled us to identify a large genomic region (Ͼ12 kb), starting near the 3Ј end of clone EP1 and consisting of an unknown number of ϳ1.5 kb repeats (Fig. 1). The length of the repeats was determined by digestion of the clone with either BamHI or BglII and Southern blotting of the fragments using a carboxyl-terminal epiplakin fragment as a probe (data not shown). Because no suitable singular restriction sites could be found in this region, the entire gene segment containing the repeats was subcloned into a bacterial artificial (BAC) vector (pBeloBac 11) via two flanking HindIII restriction enzyme sites to yield clone pDS 166 (Fig. 1). KpnI digestion of clone pDS 166 and gel electrophoresis of the fragments showed that the region containing the repeats comprised ϳ13.8 kb ( Fig.  2A). Knowing the exact length of the sequences flanking the repeat region (5Ј KpnI-BglII fragment, 1140 bp; 3Ј BglII-KpnI fragment, 345 bp), the number of 1.5 kb repeats could be defined as eight (1140 ϩ 8 ϫ 1545 ϩ 345 ϭ 13,845 bp).
Because these repeats may be subject to recombination events in meiosis because of their sequence identity, we exam-ined the genomic stability of the epiplakin locus in three different mouse strains. Southern blot analysis of KpnI-digested DNA from strains B6, CBA, and 129 revealed no differences in the length of fragments (Fig. 2B), confirming genomic integrity of all repeats in different genetic backgrounds.
In the absence of singular restriction sites in clone pDS 166 and because of its sequence redundancy, the analysis of its exact nucleotide sequence posed a major challenge. Following a shotgun cloning strategy, clone pDS 166 was digested with either BglII or BamHI, and the 1.5-kb fragments generated were subcloned (Fig. 3A). Because of the lack of potentially recombining similar sequences, single 1.5-kb clones were stably replicating and therefore could be subjected to sequence analysis. In total 21 BamHI and 30 BglII clones were analyzed. Additional sequence information about the 5Ј and 3Ј ends of this highly homologous repeat region was obtained from clones pDS 167 and pDS 170 that were isolated using singular upstream or downstream restriction sites. Based on this analysis we could identify four different classes of BamHI and five different classes of BglII subclones (Fig. 3B). Sequence alignment of a 5Ј SpeI-BamHI fragment (pDS 167) with BglII fragments enabled the extension of the exact sequence until the 3Ј end of the first BglII fragment, because only one class of BglII fragments (class I) revealed the same nucleotides (T, T, and G) at variable positions 1-3 (Fig. 3, B and C). However, gene walking in the 3Ј direction of clone pDS 167 by searching for a unique class of overlapping BamHI clones was not possible, because more than one class (classes I and III) matched nucleotides A and C at the variable positions 4 and 5 of the BglII class I clone. Performing BglII partial digestion, we were also able to clone the very 3Ј 1.5-kb BglII fragment as part of a BglII-KpnI subclone (pDS 170) of pDS 166 (Fig. 3A). Sequence analysis of this clone revealed the nucleotides C, C, A, T, C, and T at variable positions 1-5, and 1, respectively (Fig. 3B). Again, however, it was impossible to extend the sequence of pDS 170 in the 5Ј direction, because no unique classes were found, but two BamHI subclone classes (classes I and IV) were identified with nucleotides C, C, and A at variable positions 1-3 of the overlapping region. Consequently, the sequences of only two (the first and last) of the eight 1.5-kb BglII fragments identified could unambiguously be determined, and the order of the other six repeats remained undefined. Because all of the isolated BamHI fragments carried a cytosine at the variable positions 1 and 2, these positions could be specified in all repeats, leaving just three positions (positions 3-5) in repeats 2-7 undetermined (Fig. 3C). Consequently, a sequence could be deduced that lacked exact nucleotide specifications only at three positions within each of the six interior BglII 1.5-kb fragments. The two alternative nucleotides found at each of these positions gave rise to different amino acid residues only at positions 3 (Gly or Glu) and 4 (Glu or Val) (Fig. 3C). At position 5 the alternative nucleotides were at the wobble position of lysine codons. In summary, our analysis revealed that the mouse epiplakin gene comprises two exons, a noncoding exon 1 and a large (ϳ20 kb) coding exon 2 (Fig. 1). Overall, the isolated lambda clones and BAC subclones resulted in a continuous span of 42 kb of genomic sequence.
Exceptionally Large Size of Epiplakin Confirmed on Transcript and Protein Levels-A Northern blot analysis of RNA isolated from mouse salivary gland using a 802-bp carboxylterminal cDNA fragment of epiplakin as a probe for hybridiza- tion revealed the presence of a very large transcript (Fig. 4A). Based on the size of plectin transcripts (34), the size of epiplakin transcripts was estimated as ϳ22 kb. This correlated very well with the transcript size predicted: 1.5-kb 5Ј-untranslated region, 20-kb ORF, and 1-kb 3Ј-untranslated region (Fig.  1). In the absence of other detectable signals, the expression of other major isoforms caused by splice events within exon 2 of epiplakin could be ruled out.
The 19,644-bp ORF of epiplakin potentially encodes a protein of 725 kDa. To confirm this size we raised antibodies to a GST fusion protein containing a fragment of epiplakin (residues 6041-6364) without any significant sequence homologies to plectin, desmoplakin, or other proteins. The antibodies obtained were affinity-purified and used for Western blotting. Analyzing proteins extracted from a newborn mouse, we could detect a single band of molecular weight considerably higher than that of plectin (Ͼ500,000), fitting very well the expected molecular mass of 725 kDa (Fig. 4B). An immunoreactive protein band of similar size was observed when protein extracts of human HaCaT cells were analyzed, indicating that human epiplakin had the same apparent molecular weight as the mouse protein (Fig. 4C). This deviated from an earlier report where the molecular mass of human epiplakin was predicted to be 552 kDa (3).
Epiplakin Structure Predictions-The predicted amino acid sequence comprised 16 PRDs, each containing a linker followed by a module (Fig. 5A). The modules were homologous to the B-type modules of plectin, desmoplakin, and BPAG1 (9, 35). The number of modules identified (16 modules) differs from the 13 domains previously reported for human epiplakin (3). This discrepancy is due to the different numbers of almost identical repeat domains found in the carboxyl-terminal part of the protein. Only five such repeats were reported to be expressed in humans, whereas eight were identified in this study. Contrary to the linker regions, those parts of epiplakin repeat domains corresponding to modules were highly conserved compared with other plakin family members. Furthermore, similarities between human and mouse epiplakin were more pronounced among modules (e.g. ϳ90% sequence identity of modules 9) compared with linkers (ϳ63% of the corresponding linkers). A closer look identified two groups of B-type domains: group I with ϳ70% identity (modules 3, 6, and 8 -16) and group II with ϳ50% or less identity to the first B-type domain of plectin (modules 1, 2, 4, 5, and 7) (Table I). A similar differentiation has been made recently in an analysis of the crystal structure of B-and C-type domains of desmoplakin (10). The proposed characteristics of the type B module of desmoplakin are found only for group I but not group II modules of epiplakin.
In defining the start of each module, the amino acids alanine and glycine at positions 3 and 4 were used (Fig. 5B, lower  panel). This definition applies to all modules found in any of the plakins known, except for module 4 of epiplakin (Gly-Gly) and the third domain of desmoplakin (Ala-Ala). Each module of epiplakin is composed of five tandem copies of a repeat motif, in data bases (SMART, smart.embl-heidelberg.de, and Pfam, www.sanger.ac.uk/cgi-bin/Pfam, respectively) defined as the PLEC repeat (Fig. 5B). The first PLEC repeats (comprising 42 amino acids in modules 1-7 and 47 amino acids in modules 8 -16, respectively) form a hairpin-helix-loop-helix (␤2␣2) motif. Repeats 2, 3, and 4 (composed of 38 amino acid residues) exhibit a similar structure. In contrast, the fifth and last repeat in each module (comprising only 32 residues) exhibits a hairpin-helix-loop-hairpin-type structure (␤2␣␤2). The linker regions between the modules can be divided into four different types: Linker 1 (type I) is unique and shows no homology to the rest of the protein. Type II, III, and IV linkers differ in their length but are at least partially homologous to each other (Fig.  6). Linker 2 (type II), the shortest, shows homology to the carboxyl-terminal part of linker types III and IV. Linkers 3-8 belong to type III, linkers 9 -16 belong to type IV. Type IV linkers, which are virtually identical in primary structure, are the longest linkers and can be subdivided into three homology regions. Regions 1 and 2 are homologous to each other and to the amino-terminal part of type III linkers, whereas region 3 shows homology to type III linkers along their entire length (Fig. 6). Interestingly, the only two positions in the 1.5-kb sequence repeats, where our sequence analysis did not allow the assignment of one of two possible amino acid residues (Fig.  3C), were located within the linker domains. The other nucleotide differences detected, which did not affect amino acid exchanges, were located in regions encoding epiplakin modules. It follows that modules 8 -15 were fully identical in sequence (Fig. 3).
The Epiplakin Gene Is a Close Neighbor of the Plectin Gene-To determine the chromosomal localization of the murine epiplakin gene, interspecific backcross analysis using DNA progeny derived from mating of ((C57BL/6JEi ϫ SPRET/ Ei) ϫ SPRET/Ei) mice was carried out. To identify polymor-  (Fig. 7). The chromosomal localization of epiplakin was recently confirmed by new entries in the data base (www.ncbi. nlm.nih.gov/genome/guide/mouse), which mapped the mouse epiplakin gene to the supercontig NW_000106.1 at chromosome 15. A comparison of the contig sequence harboring epiplakin (BAC clone AC110211) and our genomic sequence revealed that the online sequence contained the 5Ј and 3Ј portions of the gene but only two of the eight virtually identical DNA repeats identified in our analysis. The plectin gene was identified as a close neighbor, because its 3Ј end was found to be only ϳ60 kb apart from the 5Ј end of the epiplakin gene. A comparison of the human epiplakin cDNA reported earlier (3) and the contig of human chromosome 8 recently established by the International Human Genome Sequencing Consortium (NT_037703) revealed that only the first seven DNA repeats plus the first of the following virtually identical repeats could be identified by the shotgun approach of the consortium. Similar to mouse, the human genes of plectin and epiplakin are close neighbors on chromosome 8, being separated by a mere 45 kb.
Epiplakin Is Predominantly Expressed in Epithelial Tissues-To quantify epiplakin transcripts in tissue extracts, RNase protection assays were carried out using two different epiplakin-antisense riboprobes (specific for the 3Ј-untranslated region and the amino terminus of epiplakin) and a ribosomal protein S16-specific probe for standardization (Fig. 8). Both epiplakin-specific probes revealed high levels of expression in skin, small intestine, and salivary gland, comparatively lower levels in lung, uterus and liver, and no detectable expression in brain, kidney, muscle, heart, and spleen. At large, this pattern was consistent with the immunolocalization data of epiplakin on cryosections of various mouse tissues (Fig. 9). Strong epiplakin-specific signals were found in all cell layers of the epidermis, whereas no signals could be detected in the subjacent dermis (Fig. 9A). In small intestine (Fig. 9B), epiplakin was exclusively expressed in the epithelial cell layer of the villi, whereas the inner, connective tissue showed no staining. This correlated well with negative immunoblotting results obtained with protein extracts from mouse fibroblasts (data not shown). In liver (Fig. 9C), epiplakin was prominently expressed at the margins of hepatocytes, with additional less pronounced staining of bile canaliculi and of patchy or partly filamentous structures distributed throughout the cytoplasm. In salivary gland and pancreas (Fig. 9, D and E), epiplakin-specific staining was found in the cubic epithelium of the ducts and in myoepithelial cells. No signals could be detected in skeletal muscles, brain, and kidney ( Fig. 9F and data not shown). Controls using rabbit nonimmune serum were negative (data not shown). Immunofluorescence microscopy of mouse keratinocytes revealed no filamentous cytoplasmic staining, contrary to expectations considering the subcellular localization of other plakin family members (data not shown).

DISCUSSION
Our detailed analyses of the mouse epiplakin gene locus and of epiplakin cDNA revealed that the protein is a 725-kDa translation product of a ϳ22-kb mRNA. Of the two exons constituting its gene, only one (with a remarkable size of ϳ20 kb) was found to be coding, whereas the preceding second one was noncoding. The size of the single coding exon of epiplakin exceeds that of the longest single exon reported by the International Human Genome Sequencing Consortium, a 17.1-kb exon of the titin gene (36). An even longer tandem repeat domains-containing coding exon (ϳ34 kb) was reported for submaxillary mucine in pig (37). Another intriguing feature of the epiplakin gene is the existence of eight virtually identical 1.5-kb DNA repeats, arranged in tandem at the 3Ј end of the coding sequence. Sequence analysis revealed six different types of such 1.5-kb DNA repeats. Their alignment disclosed differences in nucleotides at only five positions. We were able to determine the type and therefore the exact sequence of the first and the last of the eight repeats; the order of the remaining six could not be defined. As a result the nearly 20-kb-long coding sequence of the mouse epiplakin gene could be determined for all but 18 nucleotides residing within a span of ϳ9 kb. For each of these 18 positions only two alternative bases remained an option. Despite the near-identity of some of its DNA repeats, the epiplakin gene seems to be genomically stable as far as the number of transmitted repeats is concerned, as shown by our analysis of three different inbred mouse strains.
The isolation and analysis of the genomic locus of epiplakin, especially of the part comprising the 1.5-kb DNA repeats, was a technical challenge. In standard cloning approaches the number of DNA repeats was not stably maintained because of recombination events in E. coli. Only the use of the BAC cloning system enabled the isolation of these repeats in full and their subsequent sequence analysis. Furthermore, because all PCR-based methods and cDNA isolation techniques meet the same problems, the technical approach taken would seem to be the only reliable way to identify the exact length, sequence, and, ultimately, structure of epiplakin, or similarly repetitive schematics, the modules are shown as black boxes, and linkers are shown as lines. Repeat domain numbers are indicated. Note the different lengths of linker types I (PRD 1), II (PRD 2), III (PRD 3-8), and IV (PRD 9 -16). In the lower part, the five PLEC repeats, constituting each module, are aligned, and conserved amino acid residues at positions 3 and 4 are shown as white letters in black boxes. Conserved hydrophobic residues are marked in gray. In module 1 (representative for all others) predicted ␣-helices are boxed, and ␤-strands are double-underlined. Structure prediction was based on the three-dimensional structure of desmoplakin (10). Asterisks mark the 5-amino acid inserts (Gly-Glu-Pro-Gly-Arg) in modules 8 -16 after position 10 of PLEC repeat 1; identical modules 8 -15 are shown only once. protein structures. In this context it is not surprising that the shotgun sequencing strategy used by the mouse genome sequencing project was not successful in determining the whole sequence of the epiplakin gene. The detailed analysis of the epiplakin gene locus reported here will help to bridge a gap in the draft sequence of the mouse genome, which probably would not have been closed by conventional sequencing approaches in the near future. The single epiplakin-specific signal detected by Northern blotting corresponded well to the expected size of the fulllength transcript (ϳ22 kb), ruling out any major splice events. Furthermore, only one major immunoreactive protein band of ϳ725 kDa could be detected by immunoblotting analysis of newborn mice lysates, suggesting that proteolytic processing of full-length expressed protein was not a major issue. The different lengths of the mouse (this report) and the human (3) epiplakin cDNA ORFs (19644 versus 15195 bp) were due to different numbers of 1.5-kb DNA repeats identified in each case (eight in mouse versus five in human), whereas the remaining parts of the sequences were found homologous to each other. Because only classical screening and PCR-based methods were used for the isolation of human epiplakin cDNA, one or more of the 1.5-kb repeats could have easily been missed. This was supported by our immunoblotting data showing that human FIG. 6. Sequence alignment of mouse epiplakin type II, III and IV linkers. Types and numbers of linkers and positions of amino acid residues are indicated. Note that linker 9 (representing also linkers 10 -16) is divided into the three homologous parts 9/1, 9/2, and 9/3. Identical residues are shown as white letters in black boxes, similar amino acids are highlighted in gray. ␣-Helices and ␤-strands, predicted by at least three of four secondary structure prediction methods used (see text), are boxed and underlined, respectively. Regions of conserved secondary structures are shown as cylinders (␣-helices) and an arrow (␤-strand) below the alignment. Total RNA prepared from tissues indicated was analyzed by RNase protection assays using sequences corresponding to an amino-terminal part and to the 3Ј-untranslated region of epiplakin as RNA probes. An S16 probe served as control for RNA quantification. A, autoradiography of RNase-protected bands. Only relevant parts of the gels are shown. B, quantitative analysis of signals shown in A. Analysis was done using a PhosphorImager, and the values (arbitrary units) were normalized for the RNA loading (S16) and guanosine content of different probes. The mean values of measurements with both probes are shown. and mouse keratinocyte epiplakins have a similar apparent high molecular mass of Ͼ700 kDa, clearly above the 552 kDa reported for the human species (3). However, the exact size of human epiplakin remains to be identified using a methodological approach similar to the one presented in this study. RNase protection assays revealed high level expression of epiplakin transcripts in mouse skin, small intestine, and salivary gland and lower expression in liver, lung, and uterus. No expression could be detected in brain, muscle, heart, kidney, or spleen. Considering that epiplakin fragments were successfully amplified from a kidney cDNA library, the lack of any signal for kidney in these assays suggested that only trace amounts of epiplakin mRNA were present in this tissue. Thus, in mouse we found expression of epiplakin transcripts to be largely restricted to epithelial tissues, at variance with human epiplakin, which was reported to be widely distributed in a variety of tissues (3). The expression pattern of mouse epiplakin mRNA was in agreement with immunolocalization data of the protein on cryosections of tissues, showing its expression in all layers of the epidermis, in the epithelial layer of the small intestine, in the cubic epithelium of pancreas and salivary gland, and in liver. The more prominent expression of epiplakin observed in suprabasal compared with basal keratinocytes of the epidermis may indicate a role of epiplakin in skin barrier function, as suggested for other plakins (38,39).
Mouse epiplakin can be considered as a highly ordered pro-tein structure consisting of 16 homologous parts (PRDs), each one of them composed of a module and a linker domain. All modules and, with the exception of the first, all linkers show similarity among each other. The 16 epiplakin modules each are composed of five structural motifs known as PLEC repeats (9), based on which epiplakin counts as a member of the plakin protein family (4,5). However, in the absence of other molecular domains shared with plakins, epiplakin is atypical for this protein family. Its closest relative with regard to sequence homology of PRDs would be plectin. This close relationship is further strengthened by the fact that plectin contains the highest number of PRDs of all plakins besides epiplakin and that the epiplakin and plectin genes are just 60 and 45 kb apart on mouse chromosome 15 (40) and human chromosome 8 (34), respectively. On these grounds it can be assumed that epiplakin is a relatively young protein in vertebrate evolution and probably emerged through a duplication and subsequent amplification of a plectin PRD. Several plakins have been shown to harbor binding sites for various types of IFs in their PRDs (11,(13)(14)(15)(16)(41)(42)(43)(44)(45)(46)(47). Epiplakin, containing a multitude of such domains, might therefore be expected to contain one or more such sites. However, neither the human nor the mouse protein species harbors sequences characteristic for previously well characterized essential IF-binding sites of plakins. In particular they lack the versatile IF interaction domain, originally identified in the linker domain between PRD modules 5 and 6 of plectin (11), which is found also in other plakins, including desmoplakin and module-less periplakin. Because epiplakin, in addition, lacks an actin-binding domain found at the amino terminus of several other plakins, it remains to be shown whether it qualifies as a true cytolinker. Its unique, highly ordered structure and especially the eight almost identical carboxyl-terminal repeat domains may be perfect preconditions for its putative role as a scaffolding platform providing multiple docking sites for complex protein machineries such as those involved in signaling. This study lays the groundwork for future genetic approaches aimed at establishing the biological role of this unique protein, especially because it opens the door for generating genetically altered mice.