Characterization of a cluster of human high/ultrahigh sulfur keratin-associated protein genes embedded in the type I keratin gene domain on chromosome 17q12-21.

Low stringency screening of a human P1 artificial chromosome library using a human hair keratin-associated protein (hKAP1.1A) gene probe resulted in the isolation of six P1 artificial chromosome clones. End sequencing and EMBO/GenBank(TM) data base analysis showed these clones to be contained in four previously sequenced human bacterial artificial chromosome clones present on chromosome 17q12-21 and arrayed into two large contigs of 290 and 225 kilobase pairs (kb) in size. A fifth, partially sequenced human bacterial artificial chromosome clone data base sequence overlapped and closed both of these contigs. One end of this 600-kb cluster harbored six gene loci for previously described human type I hair keratin genes. The other end of this cluster contained the human type I cytokeratin K20 and K12 gene loci. The center of the cluster, starting 35 kb downstream of the hHa3-I hair keratin gene, contained 37 genes for high/ultrahigh sulfur hair keratin-associated proteins (KAPs), which could be divided into a total of 7 KAP multigene families based on amino acid homology comparisons with previously identified sheep, mouse, and rabbit KAPs. To date, 26 human KAP cDNA clones have been isolated through screening of an arrayed human scalp cDNA library by means of specific 3'-noncoding region polymerase chain reaction probes derived from the identified KAP gene sequences. This screening also yielded four additional cDNA sequences whose genes were not present on this gene cluster but belonged to specific KAP gene families present on this contig. Hair follicle in situ hybridization data for single members of five different KAP multigene families all showed localization of the respective mRNAs to the upper cortex of the hair shaft.

Low stringency screening of a human P1 artificial chromosome library using a human hair keratin-associated protein (hKAP1.1A) gene probe resulted in the isolation of six P1 artificial chromosome clones. End sequencing and EMBO/GenBank TM data base analysis showed these clones to be contained in four previously sequenced human bacterial artificial chromosome clones present on chromosome 17q12-21 and arrayed into two large contigs of 290 and 225 kilobase pairs (kb) in size. A fifth, partially sequenced human bacterial artificial chromosome clone data base sequence overlapped and closed both of these contigs. One end of this 600-kb cluster harbored six gene loci for previously described human type I hair keratin genes. The other end of this cluster contained the human type I cytokeratin K20 and K12 gene loci. The center of the cluster, starting 35 kb downstream of the hHa3-I hair keratin gene, contained 37 genes for high/ultrahigh sulfur hair keratinassociated proteins (KAPs), which could be divided into a total of 7 KAP multigene families based on amino acid homology comparisons with previously identified sheep, mouse, and rabbit KAPs. To date, 26 human KAP cDNA clones have been isolated through screening of an arrayed human scalp cDNA library by means of specific 3-noncoding region polymerase chain reaction probes derived from the identified KAP gene sequences. This screening also yielded four additional cDNA sequences whose genes were not present on this gene cluster but belonged to specific KAP gene families present on this contig. Hair follicle in situ hybridization data for single members of five different KAP multigene families all showed localization of the respective mRNAs to the upper cortex of the hair shaft.
The mature hair fiber is made up mainly of two major cell types. An external sheath of overlapping flattened cuticle cells, often only a single layer thick, encase the multicellular cortex. In distinct hair fibers there is also a third cell type present in the centrally located medulla. The proliferative cells that give rise to the mature hair fiber reside in the bulb at the base of the follicle. As they leave the germinative compartment, trichocytic differentiation begins, and in matrix, cuticular, and cortical cells, the genes for two families of structural proteins are activated (1). These families comprise the hair keratins and the hair keratin-associated proteins. The human hair keratin family consists of at least 16 members that are divided into an acidic, type I, and a basic to neutral type II subfamily (2,3). To fulfill their biological function, specific type I and type II hair keratin pairs are sequentially expressed and assembled into 10 nm heteropolymeric keratin intermediate filaments, termed KIFs. 1 At the height of the lower to middle cortex, these filaments are embedded into a matrix that is formed by the hair keratin-associated proteins, termed KAPs. Based on amino acid composition, essentially three classes of KAPs have been described, the high sulfur KAPs (Ͻ30 mol % cysteine content), the ultrahigh sulfur KAPs (Ͼ30 mol % cysteine content), and the high tyrosine/glycine KAPs. Up to now, these three classes have been further subdivided into 15 distinct KAP multigene families, based on amino acid homologies and the nature of their repeat structures. Thus, the KAP1-3 and KAP10 -15 families belong to the high sulfur KAPs (1, 4 -14); the KAP4,5,9 families encompass ultrahigh sulfur KAPs (15)(16)(17)(18)(19)(20), and the KAP6 -8 families comprise tyrosine/glycine-rich KAPs (21)(22)(23).
The genes of individual KAPs of a distinct family were identified in various species, and the localization of more than one KAP gene on a given DNA fragment suggested the clustering of these genes in mammalian genomes (4, 7-9, 12, 15, 17). Thus, in both sheep and man, a gene of the KAP1 family has been identified on chromosome 11q25 and the syntenic chromosome 17q12-21, respectively, regions also known to contain sheep and human type I cytokeratin genes (4,24,25). Moreover, human KAP5 genes could be assigned to chromosome 11p15 and 11q13 (26), and a cluster of high tyrosine/glycine KAP genes has been identified on chromosome 21q22.1 (27).
The premise that KAP genes are clustered in the mammalian genome has prompted us to screen a human P1 Artificial Chromosome Library (PAC library) using a conserved probe of human KAP1.1A (4). The PAC clones isolated lead to the identification of contiguous DNA sequences in the EMBO/Gen-Bank TM containing 37 genes for high and ultrahigh sulfur KAPs, which could be classified into seven KAP multigene families. cDNA transcripts of 30 KAP genes from these families were isolated from an arrayed human scalp cDNA library. In situ hybridization, using 3Ј-noncoding region probes from single members of the human KAP1-3, -4, and -9 families showed expression of the respective mRNAs in the differentiated portion of the hair cortex.

EXPERIMENTAL PROCEDURES
Isolation of KAP Gene Containing PAC Clones-A 764-bp genomic PCR product of human KAP1.1A (hB2A) (4) (see Table I) was used as a hybridization probe to screen an arrayed human PAC library cloned into the PAC vector AD10SacBII (library and screening by Genome Systems, Inc., St. Louis, MO). Six clones, termed PAC7-PAC12, were isolated (see Fig. 1). Initially, PAC DNA was prepared using a DNA midiprep kit (Qiagen, Hilden, Germany), and the ends of the isolated PAC clones were sequenced using a 33 P chain termination cycle sequencing kit as described previously (Amersham Pharmacia Biotech) (3).
Arraying and Screening of a Human Scalp cDNA Library-Screening of a human scalp cDNA library cloned into the -ZapII vector was accomplished as described previously (28). At a later date, arraying of single bacterial phagemid clones into microtiter plates and the preparation of DNA hybridization filters from these single clones were performed. Briefly, a mass in vivo excision/conversion of the human scalp library was accomplished, converting the -library clones into a more manageable Bluescript phagemid form. Dilutions of this excised library (ϳ6000 clones) were plated onto 22 ϫ 22-cm LB agar plates under ampicillin selection, and single colonies were picked and arrayed into 132-well microtiter dishes using a spotting robot. Approximately 26,000 of the single clones were then double-spotted onto a single, gridded 22 ϫ 22-cm nylon membrane in an ordered fashion. The single colonies were TABLE I Primers used for the generation of hybridization probes for KAP cDNA screening In addition to the 29 KAP cDNA sequences found by hybridization using probes derived from the gene sequences represented here, one ultra-high KAP clone was found as a by-product of another study (the isolation of high tyrosine/glycine KAP families). ϩ, PCR derived genomic hybridization probe from a conserved human KAP6 family member found on human chromosome 21 (Hattori et al. (27)  The arrayed human scalp cDNA library filters were screened using 150 -250-bp 3Ј-noncoding region probes from the putative hair keratinassociated protein genes discovered in the EMBO/GenBank TM data base BAC clones (see Table I). Prehybridization/hybridization was performed using 5ϫ SSC, 5ϫ Denhardt's solution, 0.1% sodium pyrophosphate, 1% SDS as hybridization buffer at 59°C overnight. Post-hybridization washes were performed 3 times using 0.5ϫ SSC, 1% SDS at 59°C for 30 min. The filters were autoradiographed using Kodak XAR5 film (Amersham Pharmacia Biotech). The position of doubly positive clones on the filter was determined using a 5 ϫ 5 scoring pattern, and the coordinates from the scoring pattern were then used to obtain the correct clone from the scalp library (the clones described in Table II are available from the German Resource Center or from the authors).
Subcloning and Characterization of PAC10 -The subcloning of PAC NdeI restriction enzyme fragments has been described previously (2). The subcloning of PAC10 resulted in the isolation of ϳ80% of the total NdeI fragments. End sequencing of the isolated NdeI subclones with subsequent EMBO/GenBank TM analysis of these DNA sequences identified many of these sequences on the draft DNA sequence (in progress sequencing) of two BAC clones (accession numbers AC025904 and AC037482). At the time of analysis these clones were 174,032 and 180,944 bp in size and consisted of 17 and 30 fragments, respectively. Comparison of these draft sequences with the subcloned PAC10 sequences allowed a complete orientation of the relevant draft sequences on PAC10.
Automated Fluorescent DNA Sequencing-Sequencing of the clones isolated from the arrayed human scalp cDNA library as well as the NdeI subclones from PAC10 were performed by fluorescence dye terminator DNA and Protein Analysis-DNA analysis of PAC, BAC, and cDNA clones was done using the Wisconsin GCG software package (version 10) as contained in the Heidelberg Unix Sequence Analysis Resource. The initial KAP EMBO/GenBank TM data base searches were performed using the BLASTN program. KAP homology searches on individual BAC clones was accomplished using the SIMILARITY program. Multiple KAP protein homology comparison were done using the CLUSTAL program. Repeat structure analysis was performed using the program DOTPLOT. cDNA sequence assembly and correction was accomplished using the STADEN program package.
In Situ Hybridization-In situ hybridization (ISH) of cryostat sections of human hair follicles containing scalp (taken for medical reasons) were performed as described previously (29 -31). 35 S-Labeled cRNA transcripts of hKAP1.5, hKAP2.4, hKAP3.3, hKAP4.3, and hKAP9.2 derived from subcloned 3Ј-noncoding region PCR probes (see Table I) were used for detection of the respective KAP mRNA species. Specific ISH signals were visualized using a confocal laser scanning microscope (LSM 510; Carl Zeiss, Jena, Germany). Simultaneous visualization of reflected ISH signals through epi-illumination and transmitted light in bright field were combined by overlay using pseudocolors in Figure 8 (transmission image green, electronically changed to black/ white; reflection image (ISH signal) shown in red).

Discovery of a Domain of High/Ultrahigh Sulfur KAP
Genes-Screening of an arrayed human PAC library with a full-length probe of the human KAP1.1A gene (4) leads to the isolation of six PAC clones, termed PAC7-12. DNA end sequences derived from each of these clones were compared with sequences in the EMBO/GenBank TM data base, which resulted in the identification of four fully sequenced BAC clones, AC003958, AC006070, AC007455, and AC00423, representing two DNA contigs on chromosome 17q12-21. Contig 1, comprising BAC clones AC00395 and AC006070, covered ϳ290 kb, and contig 2, consisting of BAC clones AC007455 and AC00423, was 223 kb in size (Fig. 1). In addition, two partially sequenced BAC clones, AC025904 and AC037482 (not shown in Fig. 1), were also identified. Clone AC025904 was, at the time of writing, ϳ170 kb in size and consisted of 17 fragments. Six of these fragments exhibited sequence identity with AC006070 and covered about 80 kb of one end of contig 1, whereas another fragment corresponded to ϳ2.2 kb of one end of contig 2, i.e. BAC clone AC007455. Thus, BAC clone AC025904 connected contigs 1 and 2 to each other (Fig. 1). The remaining 10 AC025904 fragments, comprising ϳ101 kb of DNA, which partially covered the gap between contigs 1 and 2 ( Fig. 1), showed low sequence identity to both contigs. The second, partially sequenced BAC clone, AC037482, ϳ180 kb in size, which overlapped with ϳ23.7 kb of one end of AC000423, progressed completely through AC007455 and ended in the gap between contigs 1 and 2, was also used for fragment identification and orientation in the region between the two contigs. Identification and Characterization of KAP Genes-Homology analysis of all the genomic DNA sequences described above, using the human KAP1.1A gene open reading frame (ORF) sequence (4) for comparison, led to the identification of 37 putative KAP gene loci. The corresponding KAP gene cluster covered a region of ϳ300 kb (Fig. 1). As a rule, the KAP genes were ϳ1 kb in size, consisted of one single exon, and were separated by 2.5-38 kb of intervening sequences. Furthermore, no unified direction of transcription could be found. Four of the loci, KAP2A, KAP3A, KAP4A, and KAP9A, showed pseudogene character based on the presence of premature stop codons or frameshifts in the coding regions of the genes.
Screening of an arrayed human scalp cDNA library using 3Ј-noncoding region probes derived from the 37 individual KAP genes led to the isolation of 30 novel KAP cDNAs (Table II). Only 26 corresponded to KAP gene sequences present in the cluster shown in Fig. 1. Thus the combination of human genomic/cDNA sequences analyzed in this report and by others (4) makes a total of 43 human KAP sequences characterized at the gene or cDNA level.
Amino acid translation of all the KAP genomic or cDNA sequences and comparison with the protein sequences of known KAPs from sheep, mouse, rabbit, and human via amino acid multialignment and evolutionary tree formation (Clustal and Clustree) allowed the assignment of the human KAPs to either the high sulfur KAP families KAP1, -2, -3 or the ultrahigh sulfur KAP families KAP4 and -9. These five multigene families were grouped together in a contiguous gene region shown in Fig. 1. Two single KAPs, which did not possess high homology with any other known KAP family member, were termed hKAP16.1 and hKAP17.1 (Fig. 1).
The human high sulfur KAP1 gene family found on this DNA  Table II). The multialignment was performed using the Clustal program (39). The asterisks beside the protein names indicates KAP1 members from other species (s, sheep; r, rat). Asterisks below the alignment indicate sequence identity; dots denote sequence homology. Hyphens in the amino acid sequences show gaps introduced during alignment. The original names of the previously isolated hKAP1.1A and hKAP1.2 genes are shown in brackets. The names of human KAP family members in bold type refer to gene products for which a respective cDNA sequence has been found. Solid boxes frame the putative pentapeptide repeats described in the text. The dotted boxes show the decapeptide repeats described in sheep KAP1 family proteins (1,7,8 Figs. 1 and 2). Two previously isolated human KAP genes, located on a single genomic clone and termed hB2A and hB2B (4), were renamed hKAP1.1A and hKAP1.2 in this paper to reflect changes that have recently occurred in KAP nomenclature (1). Neither of these hKAP1 genes has been found on the DNA domain of Fig. 1 nor could an hKAP1.2 cDNA be isolated. The hKAP1.1A and hKAP.1B genes are nearly identical in DNA sequence. The hKAP1 proteins are 12.3-18.2 kDa in size and contain 24.1-25.7 mol % cysteine (Table II). The members of the hKAP1 family exhibit two unique, highly conserved motifs at the beginning and in the center of each coding region (Fig. 2). In addition, a 20-amino acid subdomain downstream of this region (ASCCRPSYCGQSCCRP(A/V)CCC) is completely conserved in all species examined. The KAP1 proteins contain a variable number of pentapeptide repeats (CCQ(P/T)S, CCETS), and a specific combination thereof (CCETS CCQPS) shows similarity to the decapeptide repeat (SIQTSCCQPT) described in the sheep KAP1 family (1,7,8).
All KAP family members analyzed appear to have unique, family-specific amino-and carboxyl-terminal regions (see below). For the hKAP1 family, these sequences are M(A/T)CCQT (amino terminus) and (C/S)EPTC (carboxyl terminus). Two exceptions are hKAP1.4, which does not possess the aminoterminal homology, and hKAP1.2, which lacks the carboxylterminal homology.
hKAP2 proteins are 13.48 -13.51 kDa in size, and their cysteine content (27.3-28.1 mol %) is slightly higher than that of the hKAP1 proteins (Table II). They exhibit a high amino acid sequence homology to each other, as well as to two known sheep KAP2 sequences (6, 14) (Fig. 3). hKAP2 proteins consist of a series of repetitive pentamers in their amino-and carboxylterminal domains, which are separated by a KAP2-specific central amino acid region (Fig. 3). The pentamer repeats consistently start with double cysteines. Similar to the hKAP1 family, the amino and carboxyl termini of hKAP2 members appear to be family-specific sequences (amino terminus, MTG-SCC; carboxyl terminus, CRTSSC).
The human high sulfur hKAP3 gene cluster contained four members, hKAP3.1, hKAP3.2, and hKAP3.3, and pseudogene hKAP3A (Fig. 1). The cDNAs of the three functional genes could be identified. The KAP3 proteins are 10.3-10.5 kDa in size and have a rather low cysteine content of 18.3-19.4 mol % (Table II). Like the KAP2 family members, the human KAP3 family members also display high homology among each other and possess an identical number of amino acids (Fig. 4). hKAP3 family members as well as their known orthologs of other species (1,6,9) do not exhibit a discernible repeat structure and exhibit a weaker head and tail sequence specificity (Fig. 4).
Although at present only two ultrahigh sulfur KAP4 family members are known from sheep and rabbit (15,16), the human KAP4 gene family turned out to be the largest KAP family located on the DNA domain shown in Fig. 1. It is composed of 15 genes, hKAP4.1-hKAP4. 15, as well as one pseudogene, hKAP4A. Thirteen members have been identified as cDNA sequences (Table II). However, the genes corresponding to the cDNA hKAP4.13 and hKAP4.15 could not, as yet, be found on the isolated DNA domain (Fig. 1). The size of the encoded proteins ranges from 11.5 to 29.7 kDa, and their cysteine content varies between 33.6 and 36.8 mol % (Table II). The hKAP4 members exhibit a highly conserved amino-terminal end region, MV(S/ N)SCC, followed by a central region of highly repetitive, dicysteine-containing pentamers, separated by occasional pentameric, non-repetitive segments (Fig. 5). In contrast to the hKAP1-3 families, no large, non-repetitive regions are present in the central domain of hKAP4 members. The carboxyl-terminal end domains of hKAP4 proteins are less conserved, but many members contain a terminal CC(G/A)SSCC sequence (Fig. 5).
The ultrahigh sulfur hKAP9 family turned out to be the second largest family found. It consisted of seven gene sequences, hKAP9.1-9.7, two cDNA sequences not found on the DNA contig (hKAP9.8 -hKAP9.9), and one pseudogene hKAP9A (Figs. 1 and 6). The size of the hKAP9 members ranges from 16.3 to 26.3 kDa, and their cysteine content varies from 31.3 to 35.6 mol % (Table II). The structure of the hKAP9 family members is, in principle, reminiscent of that of the KAP2 family in that amino-and carboxyl-terminal pentad repeat structures are separated by central, conserved and hKAP9-specific amino acid sequences. Similar to the KAP4 family, the repetitive elements consist of multiple double cysteine-containing repeats, interrupted by two larger nonrepetitive regions of 14 and 33 amino acids. Like the other KAP families, KAP9 family members display conserved amino-and Based on their cysteine content, the proteins encoded by the two novel KAP genes hKAP16.1 and hKAP17.1 (Figs. 1 and 7) were identified as high sulfur (hKAP16.1) or ultrahigh sulfur KAPs (hKAP17.1), respectively (Table II). Remarkably, they constitute both the largest (hKAP16.1, 53.9 kDa) and the smallest (hKAP17.1, 9.5 kDa) of all hKAPs whose genes are located on the cluster described in Fig. 1. Screening of the cDNA library yielded only a cDNA clone for hKAP17.1.
Hair Follicle KAP Expression-Radioactive in situ hybridizations using 35 S-labeled 3Ј-noncoding region probes derived from representative members of each multigene KAP family found on the DNA contig (hKAP1.5, hKAP2.4, hKAP3.3, hKAP4.3, and hKAP9.2) were performed on cryosections of human scalp epidermis (a detailed expression study of all KAP members is in progress). 2 Four of the probes were specific for each of the respective genes analyzed (hKAP1.5, hKAP2.4, hKAP3.3, and hKAP4.3). A unique probe for the hKAP9.2 gene could not be found. The probe used also showed very high sequence identity to the hKAP9.4 and hKAP9.5 genes. The expression pattern for hKAP9.2 shown in Fig. 8e probably represents the expression patterns of all three genes. All five KAP family members tested showed their respective mRNA expression specifically in the middle/upper portions of the hair cortex, in the region termed the keratogenous zone (Fig. 8). No hybridization signals could be found in the hair matrix and cuticle, as well as the inner or outer root sheath for any of the KAP family members assayed. In addition, hKAP 1.5 and hKAP9.2 showed no obvious expression in the medulla of the hair follicle. Medullar expression could not be determined for hKAP1.5, hKAP2., and hKAP3.3 because no clearly medullated hair were found in these samples. In general, the ISH signal intensity was relatively uniform among all of the KAP family members. DISCUSSION In the present study we have characterized an ϳ300-kb DNA contig, which harbors 37 members of the human hair KAP multigene family. Moreover, we were able to show that this gene cluster is embedded in the type I keratin gene domain on chromosome 17q12-21 (Fig. 1). Adjacent to one end of the KAP gene cluster lie six genes (hHa7, hHaA, hHa1, hHa4, hHa3-II, and hHa3-I) that are part of the previously described type I hair keratin gene domain (3). On the other side of the KAP gene cluster, we identified the genes for cytokeratins hK20 and hK12 (32,33). These functional genes are, however, preceded by an orphan exon and a transcribed pseudogene that corre-sponds to the EMBO data base cDNA sequence AL117538, originating from a human testis cDNA library (see accession file). This gene contains 7 exons and six introns, with the last six exons displaying a high homology to the corresponding exons of the adjacent hK12 and hK20 genes, whereas exon one differs completely from keratin gene sequences. The gene possesses an open reading frame for a truncated keratin, featuring the loss of the highly conserved helix initiation motif that is indispensable for correct intermediate filament formation (34). This suggests that the gene represents a transcribed type I keratin pseudogene. The gene is thus reminiscent of a similarly truncated form of the human type II hair keratin gene, hHb1, which contains exons 5-8 of this gene and which is expressed aberrantly in breast tumors (35). The truncated hHb1 protein has been shown to be synthesized in these tumors (36).
In the region between these two keratin gene domains, on BAC clones AC006070, AC025904, AC037482, and AC007455 and partially on clone AC003958, we identified a cluster of 37 KAP genes. They can be grouped into three high sulfur KAP families, hKAP1, hKAP2, and hKAP3, two ultrahigh sulfur families, hKAP4 and hKAP9, and two single high/ultrahigh sulfur hKAP genes, hK16.1 and hKAP17.1. Due to the partial nature of BAC clone AC025904, which connects the other two fully sequenced genomic contigs, neither the exact size of this KAP gene cluster nor the exact number of KAP genes in this cluster is known. An approximation of the completeness of clone AC025904, can be made, however. The size of AC025904 has been estimated at 170 kb by pulse field gel electrophoresis (see EMBO accession file), and the amount of DNA sequenced thus far is 172,432 bp, which would suggest that this BAC clone sequence is probably over 90% complete. In addition, the identification of four further KAP cDNAs, hKAP4.13, hKAP4.15, hKAP9.8 and hKAP9.9, whose genes do not lie on the current genomic sequence, but structurally belong to their respective families, increases the probability that the majority of the KAP genes in the 300-kb region have been elucidated. On the other hand, this finding suggests that further individual KAP genes or gene clusters of some of the families described above should be present in other chromosomal locations. This also includes the previously described hKAP1.1A and hKAP1.2 genes (4), which have not been found on the current contig. An EMBO data base search using hKAP1. 2,hKAP4.13,hKAP4.15,hKAP9.8, and hKAP9.9 genomic/cDNA sequences did not, however, reveal further genomic clones that could harbor these sequences.
The high/ultrahigh sulfur hKAP genes were essentially identified by their homology with previously described KAP gene sequences from other species. In order to affirm that the gene loci identified were transcribed members of the hair KAP family, an attempt was made to isolate their respective cDNA sequences from an arrayed human scalp cDNA library. cDNA library screening was chosen over direct amplification of hair follicle mRNA by reverse transcriptase-PCR, considering that the lack of intron sequences in the KAP genes would not allow discrimination between true reverse transcriptase-PCR amplification products and the artifactual amplification of genomic DNA sequences. The isolation of cDNAs by this method, however, was partially limited by the conservation of sequences among KAP family members in their 3Ј-noncoding regions. For example hKAP9.2 and hKAP9.4 -hKAP9.9 were all fairly ho-mologous to each other in these regions. Thus, screening with one member of a family often leads to the isolation of another member that exhibited stronger expression patterns. This conservation of sequence also lead to the fortuitous isolation of related cDNAs not, as yet, found on the genomic sequences analyzed (i.e. the cDNAs for hKAP4.13, hKAP4.15, hKAP9.8, and hKAP9.9). Weak expression patterns would also explain why several KAP family members (hKAP1.4, hKAP2.3,  hKAP9.1, hKAP9.6, hKAP9.7, and hKAP16.1) were not found, for the arrayed scalp library used in these studies consisted of only 26,000 clones and therefore allowed only detection of moderate to high KAP cDNA expression.
Protein homology comparisons of the translated hKAP gene/cDNA ORFs, with previously described KAP families (1), allowed a clear-cut division of most of the hKAP proteins into five distinct families. Historically, the initial division of high/ ultrahigh sulfur KAPs into families and their numerical assignments according to the proposed KAP nomenclature were based on their mol % cysteine content, their homology to each other, and to a certain degree, the nature of their repeat structures (1). However, recent and simultaneous publications of various murine KAPs has lead to confusion in the current KAP nomenclature. 3 In 1998, Takaishi et al. (11) isolated a mouse KAP expressed in the periderm of embryonic day 16.5 mouse skin but not found in the mouse hair follicle. Its cysteine-rich structure, however, as well as its presence in the filiform papillae of the tongue and in tail scale epidermis led to the conclusion that it is a member of the high sulfur KAP family, termed mKAP13.1 (11). At the same time, Aoki et al. (13) described a high sulfur KAP cDNA found in mouse hair follicles. The authors also named this protein mKAP13, but currently, it is designated in the data base as mKAP14.1. Thereafter, two additional mouse KAP genes were identified, termed pmg1 and pmg2 (12). Pmg1 exhibits 94% amino acid identity with the protein described by Aoki et al. (13) and thus should be named mKAP14.2. Pmg2, in contrast, exhibits only 43% identity with pmg1 (i.e. mKAP14.2) (12) or the current mKAP14.1 (13), and these are the KAPs with the highest homology to pmg2. Therefore, pmg2 should be designated mKAP15.1. As a consequence, the two single hKAP genes identified in this study, which did not readily fit into the schemes of the known KAP gene families, were named hKAP16.1 and hKAP17. 1. hKAP16.1 is the largest high sulfur KAP found to date (517 amino acids). In general, hKAP16.1 possesses a fairly low homology to the other KAP family members (ϳ35-48% identity). The highest identity is seen with the sheep high sulfur KAP10 (48.2%), also one of the largest KAP proteins known (294 amino acids) (1).
Multiple alignments among any member of one KAP family give a much higher degree of homology (ϳ60 -98%) than comparisons between sKAP10 or hKAP16.1 and these families. As such, hKAP16.1 should be considered as a member of a separate high sulfur KAP family. The ultrahigh sulfur hKAP17.1 is a very small protein (9.1 kDa) and contains a central cluster of cysteine/glycine repeats, very similar to members of the ultrahigh sulfur KAP5 family (1,18,20). The amino acid homology between hKAP17.1 and the known KAP5 family members is, at most, 47.8%, whereas the range of sequence identity between the known KAP5 members is in the order of 56 -76%. In addition, all members of the KAP5 family isolated so far contain a highly conserved 14-mer amino acid sequence (P/TCCC/-VPAC-SCCSSC) not present in hKAP17.1.
It should be emphasized that the division of the human KAPs into distinct families was facilitated by bioinformatic means. Multiple amino acid alignments of all the KAPs isolated here were used in combination with the previously described KAP proteins from several species in order to generate a single evolutionary tree of all known KAP proteins (Fig. 9). This procedure allowed a graphic representation of homologous family members that neatly grouped previously characterized family members together with the human members found on the genomic contig characterized in this paper. Further Clustal alignment of the known/new family members affirmed the validity of this approach (see Fig. 2-6). Additionally, this tree grouped together several additional KAP family members from other species, thus indicating families that might, perhaps, also be clustered together on the mammalian genome. Although this procedure is statistically insufficient for evolutionary tree analysis due to KAP protein size variability, it appears in this case to be a reliable tool for the division of KAPs into families.
The presence of cysteine-rich repeat structures has been described in most of the previously characterized KAP proteins. These structures often contain double or triple cysteines, and several of these proteins (mostly of the KAP4 family) possess serine or threonine doublets (1). Decapeptide repeat structures have been shown in several KAP1 family members from other species (1,7,8) as well as in the single member of the mKAP9 family (1). Pentapeptide repeats have been shown in amino acid fragments from members of the sheep KAP1 and KAP2 families (6), and frequently occurring pentameric repeats of the form CC(R/Q)P(S/T) have been found in sheep and rabbit KAP4 family members (1,15,16). In contrast, little repetitiveness of structures is seen in the KAP3 family (1,9). In humans, repeat structures of the KAP1, -2, -4, and -9 family members consist essentially of amino acid variations of a double cysteine-containing pentapep- FIG. 7. Amino acid sequence of hKAP16.1 (A) and hKAP17.1 (B). An asterisk at the end of the amino acid sequence indicates a stop codon. tide. This pentapeptide structure is strikingly visible upon analysis of KAP sequences using the program DOTPLOT, which graphically illustrates this repetitiveness. The degree of pentapeptide repetitiveness in the human KAP1 family ranges from 9 to 16 elements. Two types of these elements, CCQPS and CCETS, correspond, to a certain degree, to the 10-member repeat elements previously reported in sheep and rat (1,7,8). In humans, however, this element is not as highly conserved. The double cysteine pentapeptide members of the human KAP2 family show 10 -11 repeats and appear to have more variation in the third or the fifth amino acid when compared with sheep KAP2 members (6). The degree and absoluteness of structure in the central portions of all KAP4 family members are striking. This subdomain consists of nearly invariant double cysteine-containing pentameric repeats. Many of these repeats contain serine/ threonine or proline in position 4 and primarily serine and threonine in position 5. The degree of repetitiveness in the hKAP4 family is high (24 -29 elements). The hKAP9 family members also contain pentameric repeat structures (14 -18) that largely resemble the structures found in the hKAP4 family. Similar to sheep KAP3 (6,9), the human KAP3 orthologs contain minimal repeat structures.
In addition to repeat structures, members of the hKAP1, hKAP2, and hKAP9 families contain central, highly conserved non-repetitive structures that are specific for each family. In the hKAP1 family, these are two regions of 21 and 43 amino acids (see hatched boxes in Fig. 2). These regions are also largely conserved in sheep and rat. In the hKAP2 family, this unique, non-repetitive structure is 24 amino acids long and is highly conserved (see hatched box, Fig. 3). The hKAP9 family non-repetitive regions are 14 and 33 amino acids in length and, with the exception of hKAP9.1 which does not appear to contain a non-repetitive region, are also highly conserved (see hatched boxes, Fig. 6).
The degree of DNA/amino acid sequence conservation between members of one hKAP family is often quite strong. This is especially the case among the hKAP2, hKAP3, and several of the hKAP4 family members.  (Table II). A very similar case is seen in three members of the KAP4 family, hKAP4.8, hKAP4.11, and hKAP4.14. All three have a very high sequence identity both in their coding and FIG. 9. The division of keratin-associated proteins into families via Clustree. A multiple alignment of the complete sequences of all known high/ ultrahigh sulfur KAP proteins as well as KAP proteins found in this report was performed using the program Clustal (39). The segregation of the previously known multialigned family members with the KAP proteins described in this paper was performed using the program Clustree (40). KAPs named in black are the proteins described above that segregated with known KAP family members from other species. KAPs named in purple, hKAP16.1 and hKAP17.1, denote two proteins that did not segregate well with known family members. KAPs named in red represent proteins found in other species. KAPs named in green denote previously described human KAP proteins. This multiple tree alignment via Clustal was adequate enough for division of the KAP proteins into families. It was not statistically significant enough to determine paralogous evolutionary relationships. Additional accession numbers not listed in Table II  non-coding regions. hKAP4.8 and hKAP4.14 possess, however, a 106-bp insert in their ORF which is not found in hKAP4. 11. When compared with each other, the hKAP4.8 and hKAP4.14 proteins have the same number of amino acids as well as a high sequence identity at the DNA (95%) and protein (92%) level. cDNA sequences have been found for hKAP4.8 and hKAP4.14.
Other members of the KAP4 family that show high amino acid homology among each other are shown as sequences of identical color in Fig. 5.
The high homology among members of a single KAP family coupled with the strong similarities in the cysteine repeat structure inside a family lead to the conclusion that multiple family members might have arisen by gene duplication, especially in duplications/deletions of the cysteine repeat motifs of these genes. The inability to delineate specific orthologous members between family members of the various species isolated to date might lead to the hypothesis that multiple members of a distinct family in a given species might have evolved independently from members in other species. These new members, however, probably originated from a common familyspecific ancestral gene found in all mammals.
The presence of a high cysteine content in the high/ultrahigh sulfur KAPs has led to the assumption that these residues might play an important role in the bundling of KIFs via cysteine cross-links (1). Since the pentameric repeats of the hKAP1, -2, -4, and -9 families contribute to the majority of cysteines in these members, they probably are significant in this KIF bundling. This hypothesis is also consistent with the expression profile of each member of the hKAP1-3, -4, and -9 families analyzed. They show a unique and specific mRNA expression in the differentiated, highly keratinized portions of the hair cortex (Fig. 8). The expression pattern of the KAP1, -2, -4, and -9 family members largely reflects that seen in other species. For example, the cortical expression pattern of hKAP1.5 (Fig. 8, a and aЈ) is similar to that seen by rat B2F (KAP1 family) (7). In rat, the absence of B2F signal in the medulla could be determined. For the KAP2 family, hKAP2.4 expression (Fig. 8b) is similar to that seen in one sheep KAP2 family member (1). Like hKAP4.3 (Fig. 8d), upper cortical expression of KAP4 family members has also been shown in rabbit (15) and sheep (16). All of the previously described species show KAP4 family member expression in the differentiated portions of the hair cortex. However, the expression in sheep, unlike rabbit and human, is localized exclusively to the paracortical cells of this region. In human hairs, a discrimination between ortho-and paracortex ("tight" versus "loose" packing of KIF bundles) is barely distinguishable. Transmission electron microscopy studies on Caucasian hair follicles largely show a central core of paracortical cells surrounded by orthocortical cells (37). Despite this pattern, no obvious ortho-/paracortical division of signal intensity could be seen in hKAP4.3 expression. hKAP9.2 expression (Fig. 8e) is similar to that seen in mouse mKAP9.1 (38). However, mKAP9.1 also showed a cuticular expression not seen for hKAP9.2. Finally, the upper cortical expression of hKAP3.3 (Fig. 8c) shows, for the first time, expression by a KAP3 family member.
The characterization of this domain of KAP family genes provides an initial study for the complete characterization of the human high/ultrahigh sulfur keratin-associated proteins. Several questions remain unanswered concerning the completeness of the KAP multigene families presented here, especially in respect to other chromosomal loci that might harbor further KAP gene members of the families described in this study. In addition, comprehensive mRNA and protein expression studies of all KAP family members presented here are needed in order to gain a better picture of how hair keratin-associated proteins are expressed in the hair follicle. This should eventually lead to functional studies of KAPs giving further insight into their precise role in hair fiber formation.