Gene Structure and cDNA Sequence Identify the Beaded Filament Protein CP49 as a Highly Divergent Type I Intermediate Filament Protein*

The fiber cell of the vertebrate ocular lens assembles a cytoskeletal structure, the beaded filament, which contains two proteins unique to the fiber cell: CP49 (phaki- nin) and CP115/CP95 (filensin). We report here the complete primary sequence and gene structure for human CP49. These data show that CP49 is a member of the intermediate filament family, but highly unusual in sev- eral regards. 1) CP49 primary sequence does not permit unambiguous assignment to any existing class of inter- mediate filament protein, but exhibits a gene structure that is identical to the Type I cytokeratins. 2) CP49 es- sentially lacks one of the three major domains that characterize all intermediate filament proteins, the carbox- yl-terminal tail domain. 3) CP49 shows substitutions at 3 of 4 residues in the otherwise highly conserved interme- diate filament protein motif LNDR. Notably, this divergence includes an Arg to Cys substitution that has only been observed in the mutant human cytokeratin K14, a mutation shown to cause the skin blistering seen in the genetic disorder Dowling-Meara epidermolysis bullosa simplex. The differentiated fiber cells of the vertebrate ocular lens assemble a cytoskeletal structure referred to as the beaded filament (1). This structure, a 5-nm diameter filament deco-rated by periodic beads, is morphologically distinct from both actin-containing thin filaments and vimentin-containing 10-nm intermediate filaments (IFs), 1 which are also present in these cells (2–5). Immunocytochemistry, heavy meromyosin

The fiber cell of the vertebrate ocular lens assembles a cytoskeletal structure, the beaded filament, which contains two proteins unique to the fiber cell: CP49 (phakinin) and CP115/CP95 (filensin). We report here the complete primary sequence and gene structure for human CP49. These data show that CP49 is a member of the intermediate filament family, but highly unusual in several regards. 1) CP49 primary sequence does not permit unambiguous assignment to any existing class of intermediate filament protein, but exhibits a gene structure that is identical to the Type I cytokeratins. 2) CP49 essentially lacks one of the three major domains that characterize all intermediate filament proteins, the carboxyl-terminal tail domain. 3) CP49 shows substitutions at 3 of 4 residues in the otherwise highly conserved intermediate filament protein motif LNDR. Notably, this divergence includes an Arg to Cys substitution that has only been observed in the mutant human cytokeratin K14, a mutation shown to cause the skin blistering seen in the genetic disorder Dowling-Meara epidermolysis bullosa simplex.
The differentiated fiber cells of the vertebrate ocular lens assemble a cytoskeletal structure referred to as the beaded filament (1). This structure, a 5-nm diameter filament decorated by periodic beads, is morphologically distinct from both actin-containing thin filaments and vimentin-containing 10-nm intermediate filaments (IFs), 1 which are also present in these cells (2)(3)(4)(5). Immunocytochemistry, heavy meromyosin labeling, and cell fractionation/co-enrichment studies have shown that beaded filaments are biochemically distinct from established cytoskeletal elements as well (6 -9). Two proteins, CP49 (phakinin (10)) and CP115/CP95 (filensin (11)), have been localized to the beaded filament. Both proteins have been shown by Northern and Western blotting to be expressed only in the lens and only in the differentiated fiber cells of the lens (7,9,12,13). Also tightly associated with the plasma membrane-cytoskeleton complex, including the beaded filament, is ␣-crystallin (14 -18). ␣-Crystallin is the most abundant protein in the lens and exists predominantly as a soluble cytoplasmic protein. However, a small fraction of the total ␣-crystallin resists extraction from the plasma membrane-cytoskeleton complex and has been immunocytochemically localized to both beaded filaments and intermediate filaments (16). While the ␣-crystallin bound to the cytoskeleton is a small percentage of the total ␣-crystallin pool, it is quantitatively a major component of the cytoskeletal fraction. The role of the ␣-crystallin in the cytoskeletal fraction is not yet clear, but its function as a chaperonin has recently been linked to the dynamics of cytoskeleton assembly in the lens (19).
The complete cDNA sequence for bovine filensin (CP115/ CP95) has been published, revealing it to be an 85-kDa protein that shows strong sequence similarity to the intermediate filament family of proteins (20). This finding was largely unexpected since antibody probes that were generally considered diagnostic for IF proteins did not react with either CP49 or filensin (21). 2 More significantly, cytoplasmic IF proteins have not been demonstrated in structures other than the classical 8 -11-nm intermediate filaments (22,23). Thus, filensin was the first cytoplasmic IF protein to be localized to a non-IF cytoskeletal structure (9,24). Analysis of filensin's primary sequence revealed several other features that were atypical of IF proteins (13,20,25), including a primary sequence that was not clearly related to existing IF classes.
Partial sequence of mouse CP49 has been published, establishing that it, too, is an IF-like protein (26). Sequence data for bovine CP49 (11) and chicken CP49 (27,28) have emerged as well and confirm CP49's relationship to the IF family. However, like filensin, the primary sequence of CP49 did not show a level of identity that permitted unambiguous assignment of CP49 to any existing class of IF protein, nor was its sequence closely related to its assembly partner, filensin. Thus, the issue of whether CP49 and filensin represented novel IF classes has been left undefined. Furthermore, the reported sequences for chicken CP49 and bovine CP49 showed a dramatic divergence in the amino-terminal head domain. Since homologous IF proteins are usually well conserved, this divergence was highly unusual.
We report here the complete cDNA and predicted amino acid sequences of human CP49 as well as the structure of the human CP49 gene. This represents the first report of CP49 gene structure and the first complete and correct report of a mammalian CP49 primary sequence. These data show that 1) CP49, despite the exceptional degree of sequence divergence, is a Type I cytokeratin; 2) CP49 essentially lacks a carboxylterminal tail domain; and 3) CP49 shows substitutions at 3 of 4 residues in the otherwise highly conserved IF protein motif LNDR, a motif considered critical to IF assembly and one that has been demonstrated to be a "hot spot" for mutations that cause human skin blistering disorders. We also describe corrected sequence for bovine CP49 that resolves the unexpected differences seen between the human and bovine CP49 head domains. These data combine with that reported for filensin to establish that a non-intermediate filament cytoskeletal structure has been assembled from two proteins recruited from the intermediate filament family.

MATERIALS AND METHODS
Human lens total RNA was isolated from human donor lenses provided by the Lions Eye and Tissue Bank (Sacramento, CA) following the guanidine isothiocyanate/acid phenol extraction protocol of Chomczynski and Sacchi (29). Ten micrograms of RNA was reverse-transcribed using Superscript reverse transcriptase (Life Technologies, Inc.) following the manufacturer's instructions, using dT-adapter primers (30), CP49-specific primers, or random hexamer primers. Following cDNA synthesis, the reaction was heated to 65°C for 10 min and diluted to 500 l with 10 mM Tris, pH 8.0, 1 mM EDTA. For PCR, 5 l of cDNA was used as input template; cDNA pools were stored at Ϫ20°C.
Following the determination of the chromosomal location of CP49 sequences (31), we purchased the following chromosome 3-specific libraries from American Type Culture Collection: 57751, 57717, and 57748. Plating and screening of the phage libraries were performed using standard techniques (32). Library 57751 was screened with human CP49 cDNA sequences using radioactively labeled probes synthesized using the Pharmacia oligolabeling kit. Screening of the library and purification of positive phage were performed as described (32).
Inserts from positive phage were characterized by PCR. Initially, PCR was used with exon-specific primers to generate intron/exon boundary fragments for cloning and sequencing. Subsequently, phage genomic DNA was isolated and purified, digested with HindIII, and cloned into pSP72. In some cases, oligonucleotides specific for the sequences flanking the HindIII site of the Charon 21A phage were used (primers were generously provided by Beverly Allen, University of Florida). Amplification of the entire phage insert was performed using 35 cycles of 94°C for 30 s, 52°C for 30 s, and 72°C for 3 min.
Human CP49 genomic clones were sequenced using oligonucleotide primers chemically synthesized by the University of California Davis Protein Structure Laboratory. The entire cDNA sequence presented was determined for both strands. Intron sequences presented have been determined for one strand. Comparisons of the determined sequences with data bases were performed using the University of Wisconsin Genetics Computer Group package, using fastapep.cmo with default settings.

RESULTS
Human CP49 Primary Sequence-Published murine (26) and corrected bovine (GenBank accession numbers X75160 and U12016) CP49 cDNA sequences were aligned for the purpose of identifying conserved regions. Some of these regions were selected for the design of oligonucleotides that were used in PCR to amplify human lens cDNA, and amplified products were cloned and sequenced.
Collectively, these PCR-amplified products encompassed the entire open reading frame of human CP49 plus the 3Ј-untrans- lated region. Subsequent sequencing of human genomic fragments from a genomic library confirmed the majority of the nucleotide sequences obtained from PCR products. The open reading frame was 1245 bases, encoding 415 amino acids, with a predicted molecular mass of 45,835 Da and a pI of 5.30 ( Fig. 1). In previous work, Northern blotting with human CP49 probes established that the sequence was specific to human and of appropriate size (26).
The deduced amino acid sequences of the entire human CP49 protein and the human CP49 rod domain were both compared with the SWISSPROT data base. The best match was the partial mouse CP49 sequence, at 85.8% identity, a level of identity consistent with the strong conservation seen between homologous IF proteins from different species. However, the murine sequence extended only from the middle of coil 1b to the COOH terminus; thus, comparison of the amino-terminal end of the molecule was not possible. Human CP49 and the published bovine CP49 sequences (10) showed little similarity in the first 67 amino acids of the amino-terminal head domain, but aligned well from that point on. Again, because homologous IF proteins tend to be highly conserved, this dramatic divergence was surprising. To address this, we used PCR to confirm the bovine CP49 sequence and established that the published bovine cDNA sequence had omitted 2 nucleotides, resulting in a frameshift mistranslation of the first 67 residues. The corrected bovine sequence (GenBank accession number U12016) shows an 89% level of identity to the human sequence reported here.
The subsequent 35 best matches between human CP49 and the SWISSPROT data base are shown in Fig. 2. All are IF proteins, establishing a clear relationship between CP49 and the IF family.
Secondary Structural Features-Among IF proteins, the amino-terminal head and carboxyl-terminal tail domains are quite variable in both size and sequence. The central rod domains are better conserved, both in primary sequence and, to an even greater degree, predicted secondary structural features. Thus, classification of a novel sequence as a member of the IF family is based on secondary structural features as well as primary sequence identity.
To assess CP49 for the presence of secondary structural features that are characteristic of IF proteins, CP49 was aligned with the Type I cytokeratins K10 and K18. These two proteins were the closest human matches produced by the data base search, and both have been well characterized with respect to secondary structure (33). This alignment, shown in Fig. 3, permitted the identification of the central rod domain of CP49 as well as the subsequent identification of the coil and linker regions that characterize IF protein central rod domains.
The optimal alignment of CP49 and K10 permitted clear identification of CP49's variations of the highly conserved LNDR and TYRKLLEGE motifs near the ends of the central rod domain (Fig. 3, boldface) as well as essentially all of the major secondary structural features that are highly conserved among IF proteins (22,23). Among the conserved features identified were 1) a central rod domain of appropriate overall size, 311 amino acids (34); and 2) "coil" domains within the central rod that exhibit a heptad repeat pattern of amino acids, in which positions 1 and 4 of the heptad (Fig. 3, asterisks) are dominated by, but not exclusively, apolar residues (35). This heptad repeat pattern is predictive of ␣-helical secondary structure, and thus, these regions are referred to as coils. IF proteins typically exhibit three major coil domains, 1a, 1b, and 2, whose size and position are well conserved. The number, position, and length of the coil regions in CP49 are consistent with those see among IF proteins (34) and summarized in Fig. 3. Also char-acteristic of many IF proteins is a "stutter" in the heptad repeats in coil 2, where the repeat pattern is interrupted (34).
3) Between coil regions are short segments that lack the heptad repeat pattern and thus the predicted ␣-helicity. These are referred to as "linkers." The size, location, and number of linker regions in CP49 are consistent with those conserved among IF proteins.
The most notable departure of CP49 from the consensus domain structure of IF proteins is the virtual absence of a COOH-terminal tail domain in CP49. CP49 primary sequence terminates almost immediately after the end of the central rod domain, giving it the most abbreviated tail domain of any IF protein defined to date. Thus, with the exception of a missing/ truncated tail domain, CP49 exhibits a predicted secondary structure that is indistinguishable from the highly conserved domain structure that characterizes IF proteins.
The LNDR sequence at the beginning of coil 1a (Fig. 3,  shaded) is among the most highly conserved motifs in IF proteins (22, 34, 36 -38). Human CP49 shows substitutions at 3 of 4 residues, from LNDR to LGGC, the most significant divergence yet seen among the IF proteins. We have established the identical LGGC sequence in mouse CP49, 2 and it has been reported in bovine CP49 (10) as well, but not in chicken CP49 (28).
Classification of CP49 -Since the primary sequences and overall sizes of the head and tail domains of IF proteins are more variable, IF proteins have been grouped into classes or types on the basis of the degree of sequence identity among central rod domains (23). The rod domain of CP49, identified by alignment with Type I cytokeratins in Fig. 3, was compared with the SWISSPROT data base to determine whether, on the basis of sequence identity, CP49 could be assigned to any existing class of IF protein. Fig. 4 shows the results of this comparison.
When ranked on the basis of percent identity in the rod domain, the best 21 matches were Type I cytokeratins. However, the level of identity between CP49 and any Type I cytokeratin did not exceed 36.1% and ranged down to 27%. The most similar human Type II, IV, and III IF proteins were 28.2, 27.2, and 27.0%. Thus, the level of sequence identity between CP49 and the Type I cytokeratins is well below that usually seen among members of the Type I class (50 -90%) and is more typical of the level of identity seen between IF classes (Ͻ40%) (23,35). On the basis of sequence identity, then, CP49 does not fit readily into any of the existing IF classes.
CP49 Gene Structure-Analysis of IF protein genes has revealed that conservation of IF protein gene structure parallels that seen in primary sequence: intron number and position have been well conserved, but type-specific variations in the number and location of introns have also emerged. Thus, gene structure constitutes an independent mechanism for the classification of an IF protein (35, 37,39,40).
A schematic showing the intron locations in the human CP49 gene and comparing them with those of other IF classes is presented in Fig. 5. Intron locations in the CP49 gene were defined by comparison of genomic and cDNA sequences.
As seen in Fig. 5, intron A is found only in Type II cytokeratins. Because intron A is absent from other classes of IF proteins, it is considered diagnostic for that group. PCR amplifi-cation of genomic DNA and analysis of genomic DNA isolated from a phage library provide no evidence for intron A in the human CP49 gene. Thus, the human CP49 gene lacks an intron that is conserved among Type II cytokeratins.
Intron B is conserved in Type I and II IF genes. PCR amplification of this region from CP49 genomic DNA was not successful. However, we have isolated genomic sequences encompassing intron B from a phage library and have sequenced the intron/exon boundary from that purified phage DNA. DNA sequencing shows that the human CP49 gene contains an intron between codons for Gln 163 and Val 164 , confirming the presence of intron B.
The exact location of intron C was determined from sequencing human genomic DNA from a CP49-positive phage. Oligonucleotide primers 3953 and 3815 produced a 1.5-kilobase pair amplification product that was sequenced from each end, confirming the cDNA sequence and locating each end of the intron. Sequence data demonstrate that within the codon for CP49 Arg 191 is a 1.2-kilobase pair intron. This amplification product also shows that no intron is present at CP49 amino acid 210, the predicted location of intron D, which is conserved among Type II and III, but not Type I, IF genes. DNA sequencing of this region was also performed on templates isolated without PCR, yielding identical results.
The location of intron E was determined from a PCR product obtained during characterization of CP49 genomic DNA. Oligonucleotide primers 4149 and 4151 were used to produce a 2-kilobase pair product that was obtained only in limiting quantities with genomic input DNA. The DNA sequence of the amplified fragment again confirms the cDNA sequence and shows an ϳ1.8-kilobase pair intron between Glu 243 and Asp 244 . Sequence was confirmed in genomic DNA not amplified by PCR. Introns F and G each contain HindIII sites and were isolated in two parts. Both the 5Ј-and 3Ј-ends of each intron have been sequenced, and the cDNA sequence was confirmed.
Intron H, important in characterizing the potential relationship between CP49 and K19, was contained in a genomic phage. DNA sequencing of this region of CP49 shows the presence of intron H within the codon for the last amino acid of CP49.
The phase of the triplet codon that is interrupted by an intron has been a generally well conserved feature among IF proteins and provides an additional means of verifying the authenticity of a conserved intron. Fig. 6 depicts the nucleotide sequences of the intron/exon boundaries of CP49 and identifies the phase of the triplet codon at which introns have been inserted. Introns B, E, F, and G interrupt the DNA sequence between triplet codons, while introns C and H both split the triplet codon after the second nucleotide. This pattern is in keeping with that conserved among IF genes (41,42).
These data show that the number and precise location of introns in CP49 are identical to those seen in Type I cytokeratins, with the exception of those introns that would have been located in CP49's missing tail domain. Notably, CP49 lacks an intron seen in either Type II or III cytokeratins (intron D).
Tailless CP49 Versus Tailless K19 -K19 is a Type I cytokeratin that has an abbreviated tail domain, extending 13 amino acids beyond the end of the central rod domain (43,44). K19 is also unique among Type I cytokeratins in lacking the intron (designated here as "H") ( Fig. 4) that all other Type I IF genes have at the end of the central rod domain (43). This has led to the hypothesis that K19 diverged from the rest of the type I cytokeratins at a relatively early point (44). CP49's abbreviated tail and general low level of sequence identity to other Type I keratins suggested the possibility that CP49 might be more closely related to K19 and represent an addition to the K19 branch of Type I IF proteins. However, DNA sequence analysis of a phage insert containing human CP49 genomic sequences identified the presence of an intron within the last codon of CP49. The presence of this intron is consistent with Type I keratin origins, but not K19 evolutionary origins. DISCUSSION We report here the complete cDNA and amino acid sequences for human CP49, the first complete and correct report of a mammalian CP49 sequence, and the first description of CP49 gene structure. Comparison of CP49 sequence with the SWISSPROT data base shows that the best 35 matches are IF proteins, clearly linking CP49 to the IF family. CP49's membership in the IF family is further supported by analysis of its predicted secondary structure. Alignment of CP49 with the closest Type I cytokeratin matches permits clear identification of a central rod domain in CP49 as well as all of the secondary FIG. 4. Comparison of the human CP49 rod domain with the SWISSPROT data base. To explore the relationship of CP49 to existing IF protein classes, the rod domain of human CP49 was compared with the SWISSPROT data base. The best 35 matches are shown and listed in order. Additionally, the next six best human matches are included, retaining their numerical ranking, 39 -48. micfib, microfibrillary protein; NF-H, neurofilament protein-heavy; NF-M, neurofilament protein-medium; other abbreviations as in Fig. 2 legend.   FIG. 5. Comparison of intron locations. The location of introns in the human CP49 gene are compared with those found in the rod domains of Type II, III, and I IF proteins. Also shown is K19, a Type I cytokeratin, but one that is unusual in lacking intron H. Introns A-H are shown. Vertical bars mark the approximate location of each intron as well as the IF protein type in which it is found. The three major domains of IF proteins are indicated, as are subdomains of the central rod region. structural features that characterize the rod domains of IF proteins and that are considered diagnostic for IF proteins (22, 23, 34 -36, 45, 46). Collectively, these data constitute very strong evidence that CP49 is a bona fide member of the IF family.
While overall sequence and predicted secondary structure make a compelling case that CP49 is an IF protein, primary sequence data do not make a strong case for placement of CP49 within any of the existing classes of IF proteins. The CP49 rod domain is most similar to the Type I cytokeratins (Fig. 4), but the level of identity between CP49 and other Type I IF proteins is much lower than that usually seen among members of an IF class and is more typical of that seen between classes. Indeed, comparison of CP49 central rod domain sequence with that of other types of IF proteins showed only modestly lower levels of identity to these other types (Fig. 4). Thus, primary sequence similarity did not permit a confident assignment of CP49 to an existing class of IF protein and suggested that CP49 might constitute a novel class of IF protein. To further clarify the relationship between CP49 and the IF family, we defined the human CP49 gene structure. Within a class of IF proteins, the number and location of introns are strongly conserved, and the presence/absence of a particular intron can be diagnostic for IF class (35,41,42). Thus, classification based on primary sequence can be alternatively confirmed or refuted by examination of gene structure. Our data show that CP49 gene structure is identical to that of the Type I cytokeratins (35, 41). Specifically, CP49 lacks intron A (Fig. 5), which is characteristic of Type II cytokeratins, and intron D, found in Type II and III, but not Type I, IF proteins. Finally, the CP49 gene retains intron B, which is unique to Type I cytokeratins. The identical gene structure between CP49 and the Type I cytokeratins is evidence that their similarity arises from sharing a common origin rather than by convergent evolution. CP49 thus exhibits a degree of sequence divergence not previously reported among the Type I cytokeratins.
This determination of CP49 gene structure also bears on the relationship between CP49 and K19. The primary sequence of K19 extends ϳ13 amino acids beyond the end of the central rod domain, an unusually short carboxyl-terminal tail domain. In fact, K19 has been referred to as the "tailless" keratin because of this feature. Thus, CP49 and K19 share a common and very unusual feature among IF proteins: a highly abbreviated carboxyl-terminal tail domain. K19 is also unusual among the Type I cytokeratins in lacking an intron that is located at the end of the central rod domain and that is conserved among Type I, II, and III IF proteins. Instead, the exon encoding the end of the central rod domain of K19 extends several dozen bases beyond the site where that intron would have occurred, encoding the abbreviated carboxyl-terminal tail. Thus, while K19 is considered a Type I IF protein, this variation in gene structure has led to the suggestion that K19 has diverged slightly from the Type I family (44). The abbreviated tail domains of K19 and CP49, combined with the low level of identity between CP49 and the Type I keratins, might be considered evidence that K19 and CP49 were closely related. Our determination that the CP49 gene contains intron H (Fig. 5), which is present in all Type I acidic cytokeratins, but not in K19, makes this evolutionary relationship unlikely.
CP49, while clearly an IF protein, also exhibits features that are unique among IF proteins. 1) CP49, as indicated, essentially lacks a carboxyl-terminal tail domain, with the amino acid sequence terminating at or near the end of coil 2. The absence of a tail domain is conserved among the human, bovine, and murine CP49 proteins. 2) CP49 shows the greatest degree of sequence divergence in one of the most highly conserved motifs among IF proteins, the LNDR motif found near the beginning of coil 1a (Fig. 3, shaded). The capacity for in vitro assembly of IF proteins into 10-nm filaments is extremely sensitive to changes in this region. Among IF proteins, 1 and very rarely 2 residues will vary from the consensus LNDR motif. CP49 shows three substitutions, LNDR to LGGC, the greatest degree of substitution yet reported. Of particular interest is the Arg to Cys switch at the fourth position. The inherited human skin disorder Dowling-Meara epidermolysis bullosa simplex has been shown to be caused by a point mutation in the Type I cytokeratin K14 that results in this same Arg to Cys switch. This importance of this point mutation and its role in Dowling-Meara epidermolysis bullosa simplex are supported by studies on the in vitro assembly of mutant K5/K14, where the introduction of this mutation proves sufficient to disrupt assembly. Finally, engineering this mutation in transgenic mice results in an epidermolysis bullosa simplex-like phenotype (23,(47)(48)(49)(50). This Arg to Cys substitution is seen in human as well as bovine (11) and murine 2 CP49 proteins, but not in chicken CP49. Thus, a variation seen in the LNDR motif of human CP49 and conserved among mammalian CP49 proteins is pathogenic when it occurs in the human Type I cytokeratin K14.
Type I acidic cytokeratins are also characterized by their obligatory co-assembly with Type II neutral-basic cytokeratins into a heterodimer at the first stage of filament assembly. The mature 10-nm filament is therefore a 1:1 mixture of Type I and II proteins. If CP49 represented a bona fide Type I acidic cytokeratin, it would be predicted to have an acidic pI and that its natural assembly partner would be a Type II cytokeratin with a neutral-basic pI. CP49, with a pI of ϳ5.3, is consistent with this prediction. However, the filensin sequence, if related to a Type II protein, has diverged considerably and predicts an even more acidic pI of 5.1, rather than the expected neutralbasic pI. Masaki and Watanabe (13) analyzed a partial sequence for rat filensin and concluded that it was similar to Type II cytokeratins. Subsequently, Gounari et al. (20) have analyzed the complete bovine filensin sequence and found that regions of the central rod domain exhibited similarities to Type III, IV, and VI IF proteins. Thus, additional evidence is re- quired to determine the relationship of filensin to existing IF classes. At this juncture, it is unclear whether CP49 and filensin represent a highly specialized keratin pair or a unique combination of IF proteins from different classes. Preliminary data on the gene structure of filensin suggest that it has similarities to Type II cytokeratin genes. 2 The accumulated data now clearly establish filensin and CP49 as IF proteins. However, sequence analyses establish both as highly unusual, a finding consistent with their presence in a nontraditional IF structure. The most provocative questions that derive from these observations are both why and how these two proteins assemble into a non-IF cytoskeletal element. How two IF proteins assemble into a non-IF structure is unknown, but the significant changes in the primary structure of these proteins would seem a likely explanation. Interestingly, we and others have shown that in vitro, purified CP49 and filensin can assemble into classical 10-nm filaments. This suggests that the two proteins can be directed into alternative assembly pathways and, as a corollary, that some additional factor or environment is necessary to direct assembly into beaded filaments. Alternatively, the beaded filament may represent a stabilized intermediate in the process of 10-nm filament assembly. Sauk et al. (Fig. 4a of Ref. 51) have shown a beaded filament-like structure occurring in the process of in vitro filament assembly from cytokeratin pairs. Why a beaded filament occurs in the lens fiber cell is an equally compelling question. Intermediate filament networks composed of vimentin are present in the lens epithelium and newly differentiated fiber cell, but disappear from older fiber cells (52,53). In a seemingly complementary manner, beaded filament proteins are not expressed in the epithelium and first emerge in the maturing fiber cell and then persist well into the lens (7,12,54). In those regions where the two networks coexist, they appear to be independent of one another. The fact that these two beaded filament proteins are expressed only in the lens and are not expressed until the process of differentiation commences would argue for a unique fiber cell-specific function, as yet undetermined.
While the establishment of a discrete function for the beaded filament is likely to prove difficult, the two proteins, CP49 and filensin, which compose the beaded filament, have demonstrated highly unusual features that have extended the limits of the IF family. The foreshortened rod domain of filensin and the unusual sequence and secondary features of CP49 both provide naturally occurring "mutants" that should aid in our investigation of the mechanism by which IF proteins assemble and of their evolutionary origins.