Characterization of a 190-Kilobase Pair Domain of Human Type I Hair Keratin Genes*

Polymerase chain reaction-based screening of an arrayed human P1 artificial chromosome (PAC) library using primer pairs specific for the human type I hair keratins hHa3-II or hHa6, led to the isolation of two PAC clones, which covered 190 kilobase pairs (kbp) of genomic DNA and contained nine human type I hair keratin genes, one transcribed hair keratin pseudogene, as well as one orphan exon. The hair keratin genes are 4–7 kbp in size, exhibit intergenic distances of 5–8 kbp, and display the same direction of transcription. With one exception, all hair keratin genes are organized into 7 exons and 6 positionally conserved introns. On the basis of sequence homologies, the genes can be grouped into three subclusters of tandemly arranged genes. One subcluster harbors the highly related genes hHa1, hHa3-I, hHa3-II, and hHa4. A second subcluster of highly related genes comprises the novel genes hHa7 and hHa8, as well as pseudogene ΨhHaA, while the structurally less related genes hHa6, hHa5, andhHa2 are constituents of the third subcluster. As shown by reverse transcription-polymerase chain reaction, all hair keratin genes, including the pseudogene, are expressed in the human hair follicle. The transcribed pseudogene ΨhHaA contains a premature stop codon in exon 4 and exhibits aberrant pre-mRNA splicing. Evolutionary tree construction reveals an early divergence of hair keratin genes from cytokeratin genes, followed by the segregation of the genes into the three subclusters. We suspect that the 190-kbp domain contains the entire complement of human type I hair keratin genes.

The keratin multigene family comprises the cytokeratins or soft ␣-keratins, which are expressed in the various types of epithelia, and the hair keratins or hard ␣-keratins, involved in the formation of hard keratinized structures. Both can be divided into type I (acidic) and type II (basic-neutral) proteins that form the 10-nm intermediate filament network of epithelial cells by obligatory association of equimolar amounts of type I and type II keratins (1,2). Disturbances of intermediate filament formation through deleterious mutations in keratins can lead to a weakening of the structural integrity of the respective epithelial cells, resulting in hereditary disorders of skin, mucosa, nail, or hair (3)(4)(5)(6)(7). Although initial studies of hair keratin proteins of several species indicated the existence of eight major type hair keratins, four type I members, termed Ha1-Ha4, and four type II members, termed Hb1-Hb4, as well as of one minor hair keratin pair, Hax/Hbx (8 -11), it has recently been shown that the hair keratin family is distinctly more complex. In man, sequences of seven type I hair keratins, hHa1, hHa2, hHa3-I, hHa3-II, hHa4, 1 hHa5, hHa6 (previously designated hHRa1) 1 and four type II hair keratins, hHb1, hHb3, hHb5, and hHb6, have been elucidated by molecular cloning, and their differential expression in the hair matrix, cortex, and cuticle of the hair follicle has been shown (12)(13)(14)(15)(16)(17). To date, complete sequences for one human type I and two type II hair keratin genes have been described (16,18). Fluorescence in situ hybridization analyses to human metaphase chromosomes have shown that the type I hair keratin gene hHa2 was located on chromosome 17q12-q21, whereas the genes of the type II hair keratins hHb1 and hHb6 were found on chromosome 12q13 (13,18). With the exception of the human type I cytokeratin K18 gene which maps to chromosome 12q13 (19), the genes for a large number of human type I and type II cytokeratins are also located on chromosome 17q12-q21 and 12q13, respectively (20 -23), thus indicating the existence of keratin type-specific gene clusters on distinct chromosomes of the human genome.
In the course of the characterization of the human type I hHa2 hair keratin gene on a contig, 2 we discovered a partial genomic sequence of the hHa5 gene lying approximately 8.0 kbp upstream of the hHa2 gene (16). The present study attempts a bidirectional expansion and analysis of the hHa2/ hHa5 gene region by means of combined and P1 artificial chromosome (PAC) cloning in order to learn more about the organization of the human type I hair keratin gene cluster on chromosome 17q12-q21 and to possibly detect novel type I hair keratin genes.

MATERIALS AND METHODS
Isolation of Clone ghkI5.7-The characterization of the hHa2 gene, as well as the partial characterization of the hH5 gene on clones ghkI2.12 and ghkI2.17, respectively, has been described previously (16). In order to obtain the 5Ј region of hHa5, a 0.3-kbp PstI-XhoI (linker) fragment of the hHa5 cDNA (16) was used as a probe for the screening of the human genomic DNA library cloned into DashII (Stratagene, Heidelberg, Germany). The isolated clone, termed ghkI5.7, was subcloned as a 4.5-kbp HindIII-NotI fragment, which contained the majority of the hHa5 gene and a 9.0-kbp HindIII-NotI fragment, which harbored the complete sequence for the human hHa6 hair keratin gene. Complete characterization of this clone was performed as described previously (16).
Isolation of the Human PAC Clones PAC1 and PAC3-PCR frag-* This work was supported by the Deutsche Forschungsgemeinschaft, Grant Schw 539/1-3. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  ments generated by means of specific 3Ј-primer pairs from the noncoding regions of the human hHa3-II cDNA (15) as well as from the hHa6 gene (see clone characterization and Table I), were used to screen an arrayed human genomic PAC library. This library was derived from Sau3a-digested human genomic DNA cloned into the BamHI site of the PAC vector AD10SacBII (library and vector from Genome Systems Inc., St. Louis, MO). PCR screening with the hHa3-II probe resulted in the isolation of two clones (termed PAC1 and PAC2) that appeared 80% identical as evaluated by NdeI-NotI (linker) restriction enzyme (RE) digestion of each PAC clone. The larger clone, PAC1 was further characterized. PCR screening with the hHa6 primer pair resulted in the isolation of one PAC clone, termed PAC3. PCR analysis was performed using the Boehringer Expand Long PCR System (Boehringer, Mannheim, Germany) according to the manufacturer's instructions. The sequences of the primer pairs and PCR conditions are listed in Table I and  below.
Analysis of PAC Clones PCR Screening-Clones PAC1 and 3 were initially analyzed for the presence of human type I hair keratin genes by PCR-based analysis using specific primers derived from cDNA clones of hHa1 (12), hHa2 (13), hHa3-II (15), hHa4, 1 hHa5 (16) and hHa6 1 (see also, clone analysis), as well as from a genomic hHa3-I clone. 3 PCR conditions were the same as used for PAC clone isolation.
Identification of RE Fragments Containing Hair Keratin Gene Loci-The PAC1 and PAC3 DNA was digested with NdeI and NotI (linker) and separated on 0.8% agarose long gels. The separated RE fragments were Southern blotted overnight onto Hybond N ϩ nylon membranes using 0.4 M NaOH, 0.6 M NaCl as a transfer solution; 0.5 M Tris/HCl, pH 7.4, for neutralization followed by UV cross-linking of the DNA (Stratalinker, Stratagene, La Jolla, CA). The RE fragments containing conserved ␣-helical regions of human hair keratins were detected by low stringency hybridization of these blots with an ␣-helical PCR probe of the hHa1 cDNA (12) using procedures described in the ECL random priming DNA labeling, hybridization, and detection system (Amersham Pharmacia Biotech). Hybridization was performed at 60°C overnight; post hybridization washes were carried out with 1-1ϫ SSC plus 0.1% SDS and with 1-0.5ϫ SSC plus 0.1% SDS for 20 min. at 62°C each. This same procedure was also used for analysis of the isolated NdeI subclones described below.
Cloning of RE Fragments-Multiple RE fragments of NdeI-NotI cut PAC1 and PAC3 DNA were separated on 0.8% agarose long gels. Individual bands or groups of bands were excised from the gel, purified from agarose (agarose gel extraction kit, Boehringer, Mannheim, Germany), quantitated, and ligated into a dephosphorylated, NdeI-cut pGEM5Z sequencing vector. Insert-containing colonies were selected by blue/ white screening. Miniprep DNA was prepared and RE digested with NdeI or with PstI and StyI in order to determine if more than one individual fragment was present in each ligation. Each unique subclone was then end-sequenced, and maxiprep DNA (Qiagen, Hilden, Germany) was prepared. This procedure resulted in the cloning of 80 -90% of the RE fragments present on the PAC clones.
RE Fragment Orientation and Physical Map Construction-1) Generation of specific PAC subclone data bases: All NdeI subclones isolated were end-sequenced, and a data base of fragment end sequences was generated. Orientation of the fragments to each other was performed by generation of sequencing primers derived from the reverse and complement of each end sequence in order to sequence across the adjacent NdeI site via PAC DNA sequencing (termed crossover sequencing). Comparison of this crossover sequence by the FASTA program (Heidelberger Unix Sequence Analysis Resource, HUSAR) with the data base of end fragments allowed the discovery of the correct neighboring end fragment adjoining the end sequence analyzed. 2) Final contig closure: Initially, closure of the remaining open DNA fragments was attempted by PAC sequencing (primer walking) using the crossover sequences obtained from end fragments that had not found an adjacent partner. If no new NdeI site was found after sequencing 1 kbp of PAC DNA, then combinations of crossover oligos were used as primer pairs in long range PCR reactions using the Expand Long PCR System (Boehringer, Mannheim, Germany). The agarose gel separated PCR products were excised and end-sequenced by means of the crossover primers, used to generate the PCR product. A PCR product was considered genuine only when it exhibited the correct sequences from both sides compared with the initial PAC DNA sequence. These experiments were repeated twice.
Oligonuleotide Primers-PCR primer pairs and sequencing primers were selected using the Oligo 5.0 software program (MedProbe, Oslo, Norway). Repetitive sequences were generally eliminated by comparison with the repetitive element data base REPBASE from HUSAR. DNA Sequencing-Plasmid DNA, PAC DNA, and PCR products were sequenced using a 33 P Chain Termination Cycle Sequencing Kit (Amersham Pharmacia Biotech) according to the manufacturer's instructions. 0.3 g (plasmid), 3 g (PAC), or 0.1-0.7 g (PCR product) of DNA was used per sequencing with 10 pmol of the respective primer. Cycle sequencing conditions were 95°C for 30 s; 55°C for 30 s; 72°C for 90 s; 45 cycles were performed. Sequencing reactions were separated on 5.7% polyacrylamide, 8 M urea gels. At a later date, fluorescence dye terminator cycle sequencing was performed using either the Amplitaq FS-(Applied Biosystems, Weiterstadt, Germany) or Thermosequenase (Amersham Pharmacia Biotech) cycle sequencing kits. 1 g (Plasmid), 5 g (PAC), or 0.2-1.2 g (PCR products) of DNA were used with 10 pmol of the respective primer. Cycle sequencing conditions were 96°C for 15 s; 55°C for 8 s, 60°C for 240 s; 24 or 30 cycles were performed. The fluorescence-labeled sequencing reactions were analyzed on a ABI 310 capillary electrophoresis sequencing apparatus (Applied Biosystems, Weiterstadt, Germany).
DNA Sequence Analysis-In general, DNA sequence analysis was performed with several programs contained in the Wisconsin GCG package as provided by HUSAR. Contig generation and sequence correction was performed using the contig analysis software GELENTER or the STADEN program. Amino acid multialignments were performed using the software program CLUSTAL. Enhancer sequence recognition sites were determined using the TRANSFAC data base (24).
Evolutionary Tree Analysis-The 74-amino acid residues of the 2B ␣-helical subdomain of several human type I cytokeratins (K9 -17, 19 -20) and of all type I hair keratins presented here were used for multiple amino acid alignment using the software program CLUSTAL (HUSAR). Evolutionary tree construction was achieved using the CLUSTREE program (HUSAR). In order to further ensure the statistical significance of the produced data, distance matrices for each individual tree were generated using the program DISTANCES (HUSAR) and evaluated using the software program SPLITS (HUSAR).
Reverse Transcription-PCR Analysis-Total RNA was extracted from the bulbs of 10 -20 freshly plucked human hair follicles using the RNeasy kit (Qiagen, Hilden, Germany). The RNA was reverse transcribed with Superscript II according to the manufacturer's protocol (Life Technologies, Inc., Eggenstein, Germany). PCR amplification of the generated cDNA was performed using the Expand Long PCR System (Boehringer, Mannheim, Germany) and the primer pair specific for each individual keratin (see Table I). PCR conditions were 94°C for 2 min, subsequently 94°C for 10 s, X°C for 30 s, 68°C. for 2 min for 10 cycles then an additional 20 cycles using the same conditions but with 20 s of added elongation time for each cycle. X ϭ annealing temperature listed in Table I. The PCR products were separated on 1% agarose gels. DNA purification and sequencing was performed as stated above.

Screening of PAC Clones, Identification of 10 Human Type I Hair Keratin Genes and One Orphan Exon-
The combination of PAC and cloning allowed the isolation of a 190 kbp contiguous stretch of human genomic DNA harboring 10 human type I hair keratin genes (Fig. 1). The originally described contig (16), together with the newly isolated clone, contain the entire hHa2, hHa5, and hHa6 genes and cover part of the PAC 3 clone, a 135-kbp genomic fragment obtained by PAC bank screening using specific PCR primers for the hHa6 gene (see Table I). PAC3 contains eight hair keratin genes (hHa6, hHa5, hHa2, hHa8, hHa7, ⌿hHaA, hHa1, and hHa4) (Fig. 1). PAC1, which was isolated using specific PCR primers for the hHa3-II cDNA (see Table I) (15), is a 125-kbp genomic fragment harboring six hair keratin genes (hHa7, ⌿hHaA, hHa1, hHa4, hHa3-II, and hHa3-I). PAC3 and PAC1 have an overlap of ϳ64 kbp, both clones exhibiting the hHa7, ⌿hHaA, hHa1, and hHa4 genes in common (Fig. 1). The size of the hair keratin genes ranges from 4.2 to 7.5 kbp, and the genes are separated from each other by 5.5-18.4 kbp (Fig. 2). Type I hair keratin genes hHa6, hHa2, hHa8, hHa7, ⌿hHaA, hHa1, hHa4, hHa3-II, and hHa3-I are each divided into 7 exons and 6 introns. This is in contrast to the exon/intron organization of type I cytokeratin genes, which exhibit an additional, positionally highly variable intron in the region coding for the tail domain (2). However, both gene families show exceptions to these rules, in that cytokeratin gene K9 possesses an 8th (25) and hair keratin gene hHa5 a 7th intron (16), both being located in the 3Јnoncoding region of the genes. Apart from the exon/intron boundaries of intron 7 of the hHa5 gene, all other exon/intron boundaries in type I hair keratin genes are positionally conserved, and all genes are transcribed in the same direction ( Fig.  2; sequence accession numbers are presented in Table II).
In addition to these hair keratin genes, we identified a 263-bp DNA sequence, located 12.2 kbp upstream of the hHa8 gene, which exhibited a 68% sequence homology with the region coding for the ␣-helical part of exon 1 in human type I hair keratin genes (Fig. 2). Amino acid translation showed that the transcriptional direction of the sequence was the same as that of all other hair keratin genes located on the contig. However, the resulting protein sequence displayed only low homology with the characteristic helix initiation motif of type I cyto-and hair keratins. Moreover, even under low stringency conditions, a hHa1 cDNA derived ␣-helical probe did not hybridize to the 10.5-kbp NdeI fragment immediately downstream of the critical sequence (Fig. 2), thus indicating that the region 3Ј to the putative exon does not encode further ␣-helical sequences. Finally, sequencing of a further 2 kbp downstream of the 263 bp region revealed sequences completely unrelated to that coding for ␣-helical segments. It therefore appears that the critical 263 bp sequence represents an orphan exon, a phenomenon seen previously also in a cosmid clone harboring sheep type II wool keratin genes (26).
Division of the Type I Hair Keratins into Three Subgroups-Multiple comparisons of the amino acid sequences derived from the 10 hair keratin genes led to a striking sorting of the keratin proteins into three groups, A and B being structurally highly related hair keratins, and a third group C, containing structurally more heterogeneous hair keratins.
Group A encompassed hair keratins hHa1, hHa3-I, hHa3-II, 4 and the newly described hHa4 hair keratin which exhibited an overall amino acid identity of 89% (Fig. 3). Besides the rod domains, the homologous regions also comprised the entire head domains which are strictly conserved in size in this group, as well as 10 amino acid residues long carboxyl-terminal sequences adjacent to the end of the ␣-helical rod domains (Fig.  3). Hair keratin hHa1 has previously been found to be the ortholog of the murine mHa1 hair keratin (12,27), whereas the new hair keratin hHa4 is orthologous with the murine mHa4 hair keratin (28). Human hair keratins hHa1 and hHa4 possess 417 and 395 amino acid residues, respectively, and have calculated molecular masses of 47236 and 44719 Da. Within the group A hair keratins, hHa3-I and hHa3-II exhibit the by far highest sequence identity with each other (93.3%). Both contain 405 amino acid residues and have calculated molecular masses of 45934 and 46213 Da. They are highly related to the partial sequence of the murine hHa3 hair keratin (29). Based on a common sequence motif PIG(S/P)CVTNPC in their carboxyl-terminal domains (see Fig. 3), we speculate that hHa3-II may represents the ortholog of the murine mHa3 hair keratin (29).
Group B contained hair keratins ⌿hHaA, hHa7, and hHa8 (Fig. 4). These are novel hair keratins, which have not yet been described in other species. Their overall sequence homology is in the range of 81%; however, due to an almost completely identical head domain, comparison of hHa7 and hHa8 only, leads to a much higher homology value of 92.6%. Hair keratins hHa7 and hHa8 comprise 408 and 415 amino acid residues, respectively, with calculated molecular masses of 45525 and 46378 Da. The amino acid translation of the ⌿hHaA gene would result in a truncated keratin sequence due to a premature stop codon in exon 4 of the gene (position 4470 -4472), leading to an early translational arrest near the end of the ␣-helical 2A subdomain of the derived protein (Fig. 4, arrow). The actual splicing of the ⌿hHaA gene, however, can be quite different (see below and "Discussion").
Sequentially, the remaining type I hair keratins hHa2, hHa5, and the newly described hHa6 hair keratin of group C, are more heterogeneous than the other two groups. Due to completely different head and tail domains, their degree of identity is only about 70% (Fig. 5). The three hair keratins contain 448 (hHa2), 426 (hHa5), and 468 (hHa6) amino acid residues and have calculated molecular masses of 50,318, 47,585, and 52,246 Da, respectively. Hair keratin hHa2, has previously been identified as the ortholog of murine hair keratin mHa2 (13,29). In contrast, orthologs of hHa5 have not yet been described in other species. The new hHa6 hair keratin, which represents the largest member of all type I hair keratins described here, is probably identical with the minor type I hair  keratin Hax (10) and may be orthologous to a type I mouse keratin, which has previously been designated "hair-related keratin" mHRa1 (30).

Expression of the Type I Hair Keratin Genes in the Hair
Follicle-Previously, complete cDNA sequences for the human hair keratins hHa1, hHa2, hHa5, and partial cDNA sequences for hHa3-II have been elucidated, and the expression of these keratins in the hair follicle has been shown (12,13,15,16,31,32). 1 In order to determine whether the newly characterized hair keratin genes hHa3-I, hHa4, hHa6, ⌿hHaA, hHa7, and hHa8 were also expressed in the hair follicle, RNA, isolated from the bulbs of plucked human anagen hairs, was reverse transcribed, and the obtained cDNAs were amplified by means of primers that spanned the entire mRNAs of the corresponding hair keratins (Table I). As shown in Fig. 6, in all cases amplification products could be obtained whose sizes correlated with the calculated sizes of the respective mRNAs (Table I). Subcloning and sequencing of the hHa3-I, hHa4, hHa6, hHa7, and hHa8 cDNAs revealed complete sequence identity with the corresponding hair keratin genes. This was, however, not observed for the generated ⌿hHaA cDNA. This cDNA completely lacked the 83-bp long exon 2, and, notwithstanding correct donor and acceptor splice site motifs, retained the 124 bp of intron 4 (results not shown). Like the genomic sequence, the ⌿hHaA cDNA retained the premature stop codon in exon 4.

DISCUSSION
In the present study we have performed a systematic analysis of the genomic region upstream and downstream of a previously described type I human hair keratin locus on chromosome 17q12-q21 (13), which harbored the entire hHa2 hair keratin gene as well as a partial sequence of the hHa5 gene  both oriented in the same transcriptional direction (16). Using a combination of and PAC cloning, we were able to identify eight further hair keratin genes, one upstream of the hHa5 gene and seven downstream of the hHa2 gene, all being located within about 140 kbp of genomic DNA. On the basis of previously isolated complete or partial cDNA clones for the human type I hair keratins hHa1 (12), hHa3-II (15), hHa4 1 and hHa6 1 as well as an unpublished genomic sequence of hHa3-I, 3 five of the genes could be identified as encoding these hair keratins, whereas three genes, ⌿hHaA, hHa7, and hHa8 were new hair keratin genes. Multiple sequence homology comparisons of the hair keratin proteins encoded by the 10 genes led to a striking sorting into two distinct groups A and B of highly related hair keratins, which, besides in their rod domains, exhibited strong sequence and size similarities in their head domains and, to a lower degree, also in their tail domains. Group A comprised the hair keratins hHa1, hHa3-I, hHa3-II and hHa4, whereas group B contained the novel hair keratins ⌿hHaA, hHa7, and hHa8. The remaining group C hair keratins hHa2, hHa5, and hH6 displayed essentially rod domain homology.
This grouping of hair keratin proteins on the basis of sequence and size similarities was mirrored by the spatial organization of the corresponding genes in the genome. Thus, the genes for the structurally related group A and B hair keratins formed subclusters at one extremity and the center of the contig, respectively, whereas the genes for the structurally unrelated group C hair keratins were subclustered at the other end of the contig.
There is evidence that suggests that keratins encoded by the genes of a distinct subcluster may be functionally related. Expression studies for the structurally heterogeneous group C hair keratins revealed that each of these keratins is expressed differently in the hair follicle. The expression of hHa5 begins in the hair bulb, immediately above the cells, which border the dermal papilla, and subsequently extends over the entire matrix compartment (16). A recently described type II hair keratin hHb5 shows the same expression pattern (17), indicating that hHa5/hHb5 may represent a matricial hair keratin pair. Hair keratin hHa2, just as its murine counterpart mHa2, is specifically expressed in hair cuticle (29). The expression begins low down in the bulb in a single cell layer which encases the hHa5/hHb5 expressing matrix compartment. 1 Hair keratin hHa6 is mainly expressed in the cortex of the hair shaft, but weak expression can also be seen in subjacent matrix cells. 1 Thus, notwithstanding their locally different patterns of expression, group C hair keratins hHa2, hHa5, and hHa6 are, however, alike in their onset of synthesis in the lowermost bulb region and therefore stand for the early phase of hair differentiation.
In contrast, expression studies for the group A hair keratins hHa1, hHa3-I, hHa3-II, and hHa4 have shown that each of them is expressed in the cortex of the hair shaft (31,32). 1 Cortical expression has also been demonstrated for the murine orthologs mHa1, mHa3, and mHa4 (28,29,33). Thus these hair keratins are obviously involved in the late phase of terminal differentiation of the hair fiber. Interestingly, expression in the hair cortex alone has recently also been found for three highly related human type II hair keratins hHb1, hHb3, and hHb6 and their sheep orthologs K2.9, K2.10, and K2.11 (17,34). Whereas the organization of the human type II cortex keratin genes is not yet known, studies by Powell and Beltrame (35) have shown that sheep wool genes K2.9, K2.10, and K2.11 are also genomically arranged in a compact 40-kbp cluster (35).
At present, neither the site of expression of the new group B hair keratins ⌿hHaA, hHa7, and hHa8 in the hair follicle nor their function in the formation of the hair fiber is known. In this group, gene ⌿hHaA is of special interest. Just as all other type I hair keratin genes, this gene exhibits 7 exons and 6 introns. The latter are positioned normally and do not show irregularities in their donor and acceptor splice sites. However, the ⌿hHaA gene contains a premature stop codon in exon 4. Premature stop codons and frameshift mutations in the coding region have also been found in two nontranscribed pseudogene variants of the human type I cytokeratins K17 and K16 (20, 36 -38) which, in addition, are located on chromosome 17p11-p12 instead of 17q12-q21 (20,38). In contrast, the ⌿hHaA gene is present within the 17q12-q21 region and clearly transcribed into mRNAs. However, the identified transcripts not only contained the premature stop codon in the region coding for exon 4, but, in addition, they completely lacked exon 2 and retained the entire sequence of intron 4. A hypothetical protein derived from this aberrantly processed mRNA would end in exon 3 due to another premature stop codon that results from a frameshift induced by the loss of exon 2. Seeking to confirm this unexpected mRNA processing, we have analyzed hair follicle RNA from further individuals and found an even more complex situation. In most cases we observed no ⌿hHaA transcripts at all or the transcript described above, either with or without the concomitant occurrence of a minority of correctly spliced ⌿hHaA mRNA molecules, still containing, however, the premature stop codon in exon 4. Thus, both mRNA species would encode truncated and hence, pathogenic hair keratins which should be associated with a hair disorder (39). All in all, these data indicate that the ⌿hHaA gene is a pseudogene. However, unlike the untranslated K16 and K17 pseudogenes, it not only represents the first example of an expressed keratin pseudo-gene, but it is also the first to exhibit localization within the normal type I keratin gene locus on chromosome 17q12-q21.
The plethora of new data on human type I hair keratins enables the examination of the evolutionary relationship between human hair and cytokeratins. To this purpose, the highly conserved ␣-helical 2B subdomains of type I hair and cytokeratins were used for multiple alignment and evolutionary tree construction. Fig. 7 clearly shows that human type I hair keratins diverged early in evolution from the type I cytokeratins. On the other hand, divergence among hair keratins occurred later than among cytokeratins. The branch leading to the novel, structurally related hair keratins ⌿hHaA, hHa7, and hHa8 of group B separated at an early date from the main branch, the segregation of the corresponding genes into two functional genes and one pseudogene occurring quite recently. The observed type of segregation of these hair keratins does not speak for the possibility that the ⌿hHaA gene may have evolved through a duplication of either the hHa7 or the hHa8 gene, but rather from a common precursor. We have previously shown that human and primate hair keratin genes are highly conserved (40). Using PCR primers for exon 4 of the human ⌿hHaA gene and genomic DNA from two unrelated chimpanzees as template, we were indeed able to amplify this exon region of the corresponding chimpanzee gene. In both cases, however, exon 4 did not contain the premature stop codon TGA of the human gene. Instead, as in the functional human hHa7 and hHa8 genes, a CGA triplet in the chimpanzee gene encoded an arginine residue. Although further investigations are needed to confirm that the primate gene is functional, these data clearly show that the C-T mutation, leading to the premature exon 4 stop codon in the human ⌿hHaA gene, must have occurred very early in human evolution. Fig. 7 further shows, that the branch leading to the remaining type I hair keratins divided continuously during evolution first segregating in a stepwise manner into the genes of the three structurally poorly related group C hair keratins hHa6, hHa5 and hHa2, involved in the early phase of hair differentiation, and culminating in the generation of the late group A cortex keratin genes, whose diversification into multiple, structurally related members being also a rather recent event.
Within the 190 kbp of genomic DNA analyzed, the 10 type I hair keratin genes are located within a domain of only 140 kbp. The close linkage of the genes suggests the possibility of shared regulatory elements. A hair follicle specific locus control region could be responsible for cell type-specific expression, while transcription factors binding in the intergenic regions could regulate both the identical and differential expression of genes grouped in the three subclusters. In an attempt to identify possible binding sites for transcription factors, we have ana- Black boxes indicate perfect matches with sequences present in the TRANSFAC data base (24). Gray boxes indicate sequences with one mismatch. The enhancer element name is located below each box. Enhancer element sequences present in the graph but not mentioned under "Discussion" are cp1 and cp2 (59 -61), yy1 (62), yi (63), AARCCAAA (64), heat shock factor (65), sp1 (66), maf (67), ap2 (68), hox1.3 (69) and p300 (70). lyzed the DNA sequence of each hair keratin promoter starting 20 bp downstream of the putative tata box and continuing 500 bp upstream of this site. With the exception of the hHa7 gene and the ⌿hHaA pseudogene, all promoter regions exhibit the positionally conserved binding site for LEF1, a factor previously shown to play a crucial role in hair follicle development (41) (Fig. 8). LEF1-overexpressing mice display hair follicle abnormalities and disturbances in the spatial orientation of the follicles (41), while LEF1 knockout mice are hairless but develop rudimentary hair follicles (42). Besides the hHa3-I gene promoter, the promoters of all hair keratin genes grouped in subclusters A and C are alike in the presence of the enhancer element for brn2 (human n-oct3), a member of the POU III family of transcriptional activators regulating the developmental expression of neuroectodermal genes in mice (43). A role for POU domain containing transcription factors in epidermal differentiation has recently been proposed for oct6 (44). Moreover, POU II family members skn1a and skn1i exhibit differentiation specific expression in both epidermis and hair follicles (45). The newly described functional hair keratin genes hHa7 and hHa8, but not the ⌿hHaA pseudogene, specifically exhibit positionally conserved binding sites for the transcription factor ap1 as well as binding half-site for the peroxisomal proliferator activator receptor, ppar, a member of the retinoic acid/steroid receptor superfamily (46) (Fig. 8). In skin, ppar may play a role in the maintenance of epidermal lipid barrier function (47,48); a possible function of this factor in the expression of hair keratins is presently unknown. Ap1 sites in regions downstream of several cytokeratin genes have been shown as playing a role in the transcription of these genes (49 -51). Finally, DNA binding sites present in single hair keratin genes and not discussed here, are cited in the legend to Fig. 8.
The absence of ␣-helical hybridization of both the 10-kbp region upstream of the hHa6 gene and the 30-kbp region downstream of the hHa3-I gene, excludes the localization of further keratin genes on the 3Ј and 5Ј extremities of the 190-kbp contig. We therefore believe that, in particular, the 30-kbp keratin gene-free region downstream of the hHa3-I gene may delineate the physical 3Ј boundary of the type I hair keratin gene domain. Compiled data from several laboratories indicate that the genes for the six human type I cytokeratins K19, K15, K13, K14, K17, and K16 are clustered on chromosome 17q12-q21 within a region of at least 65 kbp (37,(52)(53)(54) or at most 150 kbp (55). In addition, single gene sequences for the remaining type I cytokeratins K10, K9, K12, and K20 have been elucidated (25, 56 -58). The combination of these data, coupled with our data on the hair keratin gene domain, should not only allow a final closure of the human type I keratin locus via PAC chromosomal walking, but also answer the question, whether the type I hair gene domain presented here comprises the full complement of human type I hair keratin genes.