Characterization of a First Domain of Human High Glycine-Tyrosine and High Sulfur Keratin-associated Protein (KAP) Genes on Chromosome 21q22.1*

Analysis of the EBI/GeneBankTM data base using non-human hair keratin-associated protein (KAP) cDNA sequences as a query resulted in the identification of a first domain of high glycine-tyrosine and high sulfur KAP genes located on human chromosome 21q22.1. This domain, present on the DNA accession numbers AP001078 and AP001709, was ∼535 kb in size and contained 17 high glycine-tyrosine and 7 high sulfur KAP genes, as well as 9 KAP pseudogenes. Based on amino acid sequence comparisons of the encoded proteins, the KAP genes could be divided into seven high glycine-tyrosine gene families (KAP6–KAP8, and KAP19–KAP22) and four high sulfur gene families (KAP11, KAP13, KAP15, and KAP23). The high glycine-tyrosine genes described here appear to represent the complete set of this type of KAP genes present in the human genome. Both systematic cDNA isolation studies from an arrayed scalp cDNA library and in situ hybridization expression studies of all of the KAP genes identified in the 21q22.1 region revealed varying degrees and regions of expression of 11 members of the high tyrosine-glycine genes and 6 members of the high sulfur KAP genes in the hair forming compartment.

The hair follicle represents one of the few organs of the body that, throughout life, undergoes alternating cycles of growth, senescence, and rest (1). Morphologically, the hair follicle is composed of external epithelial compartments, the outer and inner root sheaths, the companion layer, and a central hair fiber-forming (trichocytic) compartment, comprising the matrix, cuticle, and cortex. Occasionally, a centrally lying medulla is present in specific hair types. Growth of the hair originates in matrix cells located in the bulb of the hair follicle. This hair bulb surrounds a dermal fibroblast condensate, termed the dermal papilla, which is important for hair follicle morphogenesis (1). The main structural proteins of the hair fiber are the hair keratins and the hair keratin-associated proteins, KAPs, 1 the latter being encoded by a large number of multigene families (2). Hair keratins, a subset of the large keratin family whose members are found in all cells of epithelial origin (3,4), represent two multigene families, the type I (acidic) and type II (basic) families that comprise 15 members in humans (5,6). They form the 8 -10-nm intermediate filaments of trichocytes by co-polymerization of type I and type II members, which are differentially expressed during hair fiber development (7,8). KAPs, which form the matrix between the hair keratin intermediate filament bundles through extensive disulfide bond cross-linking with cysteine residues in the head and tail domains of hair keratins, possess either high cysteine or high glycine-tyrosine content (2) and are further divided into three broad groups containing, to date 17 families. These are the high sulfur families KAP1-KAP3 and KAP10 -KAP16, which contain less than 30 mol % cysteine (9 -20), the ultra-high sulfur families KAP4, KAP5, KAP9, and KAP17, with more than 30 mol % cysteine (20 -26), and the high glycine-tyrosine KAP families , which can be further divided into families with high (Ͼ60 mol %, KAP6 family) and lower (Ͻ 60 mol %, KAP7 and KAP8 families) glycine-tyrosine content. The majority of the KAP members were initially identified in non-human species (sheep, mouse, rabbit). Recently, however, a large domain of human ultra-high/high sulfur KAP gene families (the KAP1-KAP3, KAP4, KAP9, KAP16, and KAP17 families) have been identified on chromosome 17q21.2 and their expression in the hair follicle partially characterized (20,31). Moreover, chromosomal in situ hybridization studies using specific probes derived from human KAP5 family members have located new KAP genes to chromosomes 11p15 and 11q13 (32). In addition, the sequencing of human chromosome 21 has recently led to the identification of a further KAP gene domain on its long arm (21q23) (33). In a continuation of our previous KAP gene studies (20), we describe here the identification of a novel domain of 24 KAP genes (17 high glycine-tyrosine and 7 high sulfur KAP genes) and 9 KAP pseudogenes on chromosome 21q22.1. The genes could be grouped into 11 families, five of which are novel. Screening of a human scalp cDNA library as well as mRNA expression studies showed the expression of 17 individual KAP members, all of which were essentially localized to the hair cortex and several which also showed matrix and cuticular expression.

MATERIALS AND METHODS
Identification of Hair KAP Genes-Analysis of the EBI/GeneBank TM data base with DNA sequences from regions encoding previously de-* This work was also supported in part by the Deutsche Forschungsgemeinschaft (Grant Schw 539/4-1). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  scribed KAP genes/cDNAs (KAP6 -KAP8; KAP10 -KAP15 family members) using the BLASTN2 program led to the discovery of a contiguous domain of KAP genes on the human EBI/GeneBank TM data base sequences AP001708 and AP001709 (Fig. 1). Further DNA homology analysis using the program SIMILARITY led to the identification of 33 putative gene/pseudogene loci. Genes with open reading frames that showed high homology to KAP genes from other species were translated into protein sequences and multalignments made with known KAP family members using the CLUSTAL program. Evolutionary tree analysis of these proteins was performed using the CLUSTREE program. Identification of putative amino acid repeat structures was made using the program DOTPLOT. All of the programs named above are part of the Heidelberg Unix Sequence Analysis Resource.
Isolation of KAP cDNAs-3Ј-noncoding region PCR fragments from each putative KAP gene were amplified from human genomic DNA (see Table I) and used to screen an arrayed human scalp cDNA library by procedures described previously (20). Briefly, the arrayed cDNA library was screened using 5ϫ SSC, 5ϫ Denhardt's solution, 0.1% sodium pyrophosphate, 1% SDS as prehybridization/hybridization buffer. 1 ϫ 10 6 cpm/ml of hybridization solution of the respective 32 P-labeled PCR fragment (see Table I) was used as a probe. The library was hybridized overnight at 59°C. Posthybridization washes were performed three times using 0.5ϫ SSC, 1% SDS at 59 o C for 30 min. The filters were autoradiographed using Kodak XAR5 film (Amersham Biosciences). The isolated cDNA clones can be obtained from either the authors or from the German Human Genome Resource Center (RZPD) under the clone designations indicated in Table I.
Automated DNA Sequencing-The isolated cDNA clones were sequenced using fluorescent dye terminator cycle sequencing (Big Dye DNA Sequencing Kit, Applied Biosystems, Weiterstadt, Germany) and analyzed on an ABI310 capillary DNA Sequencing Apparatus. DNA sequence assembly and correction was performed using the STADEN software package (Heidelberg Unix Sequence Analysis Resource).
In Situ Hybridization (ISH)-ISH on cryostat sections of human scalp (kindly provided by B. Cribier, Strasbourg, France) and, in parallel, plucked beard hair follicles, were performed as described previously (34,35) using the PCR products shown in Table I, which were subcloned into the pCR4.1 plasmid vector (Invitrogen). Antisense 35 S- labeled transcripts of all of these subclones were used for the detection of the respective KAP mRNA species as described previously (8,34). The ISH signals were visualized using a confocal laser scanning microscope (LSM 510; Carl Zeis, Jena, Oberkochen, Germany). Simultaneous visualization of reflected ISH signals through epi-illumination and transmitted light in bright field were combined by overlay using pseudocolors (transmission image in green, electronically changed to black/white; reflection image (ISH signal) shown in red).

RESULTS
KAP Gene Identification-Our recent detection of a high/ ultra-high sulfur KAP gene domain on chromosome 17 led us to look further for human orthologs of the high glycine-tyrosine KAP gene families, KAP6 -8, previously described in other species (27)(28)(29)(30). This resulted in the discovery of a putative KAP gene domain on chromosome 21q22.1, a region not previously identified during the sequencing of this chromosome (33). We undertook a thorough DNA homology analysis and identified two contiguous genomic DNA sequences, AP001708 and AP001709, on chromosome 21q22.1, which harbored 24 high glycine-tyrosine and high sulfur KAP genes as well as 9 KAP pseudogenes (Fig. 1). The KAP gene domain analyzed covered an area of 535 kb straddling one end of both genomic sequences. All KAP genes possessed a methionine start codon and an open reading frame coding for proteins with either high glycine-tyrosine or high sulfur amino acid content. They displayed putative polyadenylation signals within a region ϳ400 bp downstream of their stop codon and nearly all of them contained a presumptive TATAA box sequence within 100 bp upstream of the initiation codon. In addition, a variety of both high glycine-tyrosine and high sulfur KAP pseudogenes (see Table II) were identified by either the absence of an in-frame initiation codon in the region of homology or by the occurrence of frame shifts in the respective homology regions. As a rule, the genes were small (less than 1 kb in size) and comprised only one exon. The intragenic distances between the KAP genes varied widely from ϳ5-100 kb, and in general, the genes possessed no unique direction of transcription. The entire gene domain could be subdivided into one large group of 8 high sulfur KAP genes/pseudogene on one end, followed by a central group of 24 high glycine-tyrosine KAP genes/pseudogenes and ending thereafter with one high sulfur KAP gene (see Fig. 1). No putative KAP genes could be found on the DNA sequences immediately adjacent to this domain, but the presence of a further potential high sulfur KAP gene domain, the one ini- tially identified during the sequencing of chromosome 21 (33), was confirmed. 2 Characterization of the High Glycine-Tyrosine and High Sulfur KAPs-The KAP genes listed in Fig. 1 were translated into amino acid sequences followed by a multialignment of the various KAP proteins. Included into these multiple sequence comparisons were both high glycine-tyrosine and high sulfur KAP proteins from other species, which had already been appointed to distinct KAP families or were found in the literature without special classification. This allowed the creation of an evolutionary tree in which human and non-human KAP proteins were grouped into distinct families (Fig. 2). In addition to the human orthologs of six previously described KAP families (KAP6, KAP7, KAP8, KAP11, KAP13, KAP15), we identified six novel KAP families, four high glycine-tyrosine KAP families (KAP19 -KAP22) and one high sulfur KAP family (KAP23). Human orthologs segregating with the members of the murine high glycine-tyrosine KAP18 family (36) (see ''Discussion'') or the sheep high sulfur KAP10 (2) and mouse KAP12 proteins (10) were not detected in the evolutionary tree (Fig. 2).
The human high glycine-tyrosine KAPs can be divided into seven families, three of which are counterparts of KAP families previously described in other species: The KAP6 family with three members, KAP6.1-KAP6.3 (Figs. 2 and 3) and the hKAP7 and hKAP8 families with one member each, KAP7.1 and KAP8.1, respectively (Figs. 2, 4, and 5) (27-29). The KAP6 family members range in size from 6.6 to 11.1 kDa and exhibit a high content of glycine and tyrosine residues (54.6 -60.5 mol %, see Table II). Except for KAP6.3, the completely identified  Table II). * denotes a KAP gene that did not segregate correctly. Sequences from other species are named using their accession numbers. KAPs named in black represent human sequences; red indicates mouse sequences; blue indicates sheep sequences; and green indicates rabbit sequences. CLUSTAL alignment allowed the division of KAP proteins into families, but was not statistically significant enough to determine paralogous evolutionary relationships.
KAP6 proteins from all species display unique amino-terminal (MCG(S/-)YY(G/R)NY) and carboxyl-terminal (GS(G/S)F-GYY(Y/-)) sequence motifs (Fig. 3). Both the 9.3-kDa KAP7.1 and the 6.8-kDa KAP8 proteins not only exhibit a particularly high sequence homology with their non-human counterparts (Figs. 4 and 5) but also display a particularly low glycinetyrosine content (34.4 and 42.8 mol %; Table II). All of the known KAP8 proteins from various species show a characteristic carboxyl-terminal amino acid sequence (RR(F/Y)(W/S)P-FALY) (Fig. 4). However, the first nine amino acids of human KAP8.1 are significantly different from the orthologous mouse and sheep sequences (Fig. 4).
KAP19 -KAP22 represent new human high glycine-tyrosine KAP families. The large KAP19 family consists of 7 members, KAP19.1-KAP19.7, whose size ranges from 5.7 to 9.1 kDa and which exhibit a rather high glycine-tyrosine content (ϳ55 mol %) (Fig. 6 and Table II Table II) of all high glycine-tyrosine KAPs described here. The KAP21 proteins exhibit sizes between 7.9 and 8.5 kDa and possess a distinctly lower glycine-tyrosine content of 50.6 -53.1 mol % (Table II), as well as family specific amino-(MCCNYY) and carboxyl-terminal (CYS(S/C)C(Y)/-/(C)) sequences (Fig. 5). Finally, the single KAP22.1 protein is 5.2 kDa in size and exhibits the second lowest glycine-tyrosine content (37.5 mol %; Table II). All of the high glycine-tyrosine KAPs show a high degree of dimeric repeat structures, usually consisting of glycine in the first position followed by either tyrosine, cysteine, serine, or glycine in the second position. The degree of repetitiveness is highly variable ranging from 10 to 30 repeats for the KAP6 proteins and 3-34 repeats for the  Table II). Multialignments were performed using the CLUSTAL program (49). The asterisks beside the protein names indicate KAP members from other species, which are designated by their accession numbers. Asterisks below the alignment indicate sequence identity; dots denote sequence homology. The sequence names in bold type denote gene products for which a cDNA has been isolated; KAP members whose expression could be shown by in situ hybridization are underlined. M95719 and Gillespie (50) are sheep sequences; d86419 -d86421 and af345298 are mouse sequences; m95718 is a rabbit KAP sequence.
The high sulfur KAP portion of the gene domain described here contains four distinct families, three of which, KAP11, KAP13, KAP15, have been previously characterized in other species (15)(16)(17)20). The largest high sulfur family is KAP13, which consists of four members, KAP13.1-KAP13.4 (Fig. 7). The human and mouse KAP13 proteins range in size from 17.7 to 19.2 kDa, possess 12.0 -12.8 mol % cysteine residues (Table  II), and contain a family specific amino-terminal end (M(S/ V)Y(N/S)CCS) as well as a high degree of positionally conserved amino acids (cysteine (14), serine (13), proline (7), glycine (6)). The remaining three human high sulfur KAP families, KAP11, KAP15 and KAP23, consist of only single members. Both KAP11.1 and KAP15.1 show a particularly high sequence homology to their respective mouse orthologs (Figs. 7 and 8) and exhibit 14.1 and 10.9 mol % cysteine residues (Table II). The KAP13 family members (size range 17.7-19.2 kDa) and the single KAP11 member (17.1 kDa) represent the largest KAPs described here. Finally, the novel KAP23.1 segregates separately from the other KAP members in the evolutionary tree analysis (Fig. 2) and was, therefore, assigned a new family (KAP23) (Fig. 8). KAP23.1 encodes a 6.8-kDa protein that possesses 7.7 mol % cysteine as well as a high mol % of serine (20.0 mol %), glycine (13.8 mol %), and leucine (13.8 mol %). No obvious repeat structure is present in this protein.
The high sulfur KAPs presented here show, like the previously described human high sulfur KAP1-KAP3 families, cysteine-containing repeat structures. These structures, however, possess a lower degree of conservation when compared with the human KAP1-KAP3 families (20). For example, KAP11.1, like its mouse counterpart (15), exhibits a fairly strong decapeptide repeat structure covering the carboxyl-terminal half of the protein (see Fig. 7). The degree of repetitiveness is higher, however, in the mouse sequence. In a similar manner, mouse KAP13.1 contains a strongly conserved decameric repeat structure containing an initial CQ(L/E) motif (16), which is only partially seen in the human KAP13 family members (Fig. 7). Both, however, possess, in addition, generally less well conserved pentameric repeats in their carboxyterminal regions. A pentameric SL(G/D)CG motif is also seen in human KAP15.1. In contrast KAP23.1 does not appear to possess an obvious repeat structure (Fig. 8).
Expression Analyses-3Ј-Noncoding region PCR products from all of the KAP genes presented in this study were used as probes to screen a previously described arrayed human scalp cDNA library (20). This led to the isolation of three high glycine-tyrosine (KAP7.1, KAP8.1, and KAP19.1) and two high sulfur (KAP11.1 and KAP13.1) KAP cDNA clones (see Table I). The low number of positive clones isolated, coupled with the identification of a partial KAP6.1 cDNA via 3Ј-RACE analysis of follicular RNA, 2 led us to believe that most of the KAP genes analyzed possessed an expression level below the limits of detection in our arrayed cDNA library, which contains only 26,000 clones. We therefore subcloned all the PCR products used in the cDNA library screen (Table I) and used them as cRNA probes for in situ hybridization studies of both plucked beard hairs and scalp sections. This resulted in the demonstration of mRNA expression for a total of 17 KAP genes, 11 high glycine-tyrosine, and six high sulfur members (Fig. 9). With the exception of the KAP7.1 gene, all of the high glycine-tyrosine and high sulfur KAP genes showed similar degrees of expression in the two hair types, but the individual expression patterns varied strongly from gene to gene. The high glycinetyrosine KAP genes, KAP7.1, KAP8.1, KAP19.1, KAP19.2 and the high sulfur KAP gene KAP11.1 exhibited strong expression. Interestingly, the prominent expression level of KAP7.1 was limited to scalp follicles, while beard hair follicles showed a drastically reduced expression pattern (Fig. 9, B and BЈ). In contrast, the remaining KAP genes displayed remarkably low levels of expression (Fig. 9). The expression patterns of high glycine-tyrosine KAP genes could be further subdivided. KAP6.1 (Fig. 9A), KAP7.1 (Fig. 9, B and BЈ), KAP19.1 (Fig. 9D), and KAP19.2 (Fig. 9E) transcripts clearly occurred in the upper portion of the hair cortex. Of these, KAP6.1 and, in particular, KAP19.1 transcripts exhibited a vertical asymmetry of expression in the cortex. Moreover, KAP19.1 showed an additional region of expression in the hair cuticle, occurring nearly simultaneously with the onset of cortical mRNA expression (Fig. 9D).
In contrast, the mRNA expression of KAP8.1 (Fig. 9C), KAP19.3 (Fig. 9F), KAP19.6 (Fig. 9H), KAP19.7 (Fig. 9I), KAP20.1 (Fig. 9K), and KAP21.2 (Fig. 9L)  these KAP genes also showed cuticular expression. For example, KAP19.6 and KAP21.2 exhibited an onset of expression in the upper portion of the cuticle, which occurred much later than their respective corticular expression. Two of the high glycine-tyrosine KAP genes (KAP8.1 and KAP19.4) displayed highly remarkable expression patterns. In the first case, the particularly strong KAP8.1 mRNA expression was essentially restricted to only one vertical half of the hair forming compartment and in beard hairs was clearly absent from the central medulla (Fig. 9C). The absence of medullar expression in beard hair sections could also be observed for the other KAP genes. In the second case, the expression of KAP19.4 was unique for it only occurred in the upper portion of the hair cuticle. Cortical expression, if present at all, was extremely weak (Fig. 9G). Finally, for six high glycine-tyrosine KAP genes (KAP6.2,  KAP6.3, KAP19.5, KAP20.2, KAP21.1, and KAP22.1), neither cDNA library screening nor in situ hybridization studies resulted in the demonstration of their expression in hair follicles.
With the exception of the extraordinary strong KAP11.1 gene expression in the late matrix and the entire cortex (Fig.  9M), the expression of the remaining high sulfur KAP genes in these areas was generally very weak ( Fig. 9, N-R). In a manner similar to several of the high glycine-tyrosine KAP members, a vertical asymmetry was seen for KAP23.1 transcripts (Fig. 9R). A clear-cut cuticular expression was detected for KAP13.2 (Fig. 9O), KAP15.1 (Fig. 9Q), and KAP23.1 (Fig. 9R), whose onset of expression occurred well after the initiation of the respective cortical expression. Interestingly, the cuticular expression of KAP13.2 appeared distinctly stronger than that in the cortex and was thus reminiscent of the expression pattern of the high glycinetyrosine KAP19.4 gene (Fig. 9G). The only high sulfur KAP gene for which no expression in the human hair follicle could be demonstrated was KAP13.4.

DISCUSSION
The preliminary identification of a KAP gene cluster on chromosome 21q23 by the Human Genome Sequencing Project (33) led us to characterize an additional domain on chromosome 21q21.2 in a manner described recently for the elucidation of a human high/ultra-high sulfur KAP gene locus on chromosome 17q21.2 (20). Analysis of the EBI/GenBank TM data base allowed the identification of 24 high glycine-tyrosine and high sulfur KAP genes on the contiguous genomic sequences AP001708 and AP001709, which are part of chromosome 21q22.1. Like the high/ultra-high sulfur KAP genes on chromosome 17, as well as KAP genes identified in other species (2,20), all of the KAP genes present on this locus were small in size (Ͻ1 kb) and consisted of only one exon. Remarkable, however, was the insertion of the high glycine-tyrosine KAP gene domain into the high sulfur KAP gene domain.
Homology comparison of the human high glycine-tyrosine KAP members with members from other species led to the division of these KAPs into seven families. Historically, nonhuman high glycine-tyrosine KAPs were divided through biochemical separation techniques into highly glycine-tyrosinerich type II members (Ͼ60% glycine-tyrosine content) and a second type I group with a lower glycine-tyrosine content (2,37). Subsequent sequence homology comparisons in sheep and mouse lead to the initial establishment of three high glycinetyrosine families: the type II KAP6 family and the type I KAP7 and KAP8 families (2,38). In the current report, three human KAP6 members and one KAP7 and KAP8 member could be identified based on their amino acid homology to sheep and mouse KAPs.
Interestingly, to date, four KAP6 members are known in the mouse (29, 36), two of which (d86420 and d86421, see Fig. 3) are completely identical in their amino acid sequences, while their mRNAs possess differing 5Ј-and 3Ј-noncoding regions. In the case of the KAP8 family, a total of four members is known in sheep, two of which (p02448 and x05639, see Fig. 4) vary from each other in only two positions, and therefore, may be polymorphisms (28, 39 -41). All in all, the present composition of the KAP6 and KAP8 families in humans, mouse, and sheep clearly suggests a heterogeneity in gene number among different species. Since humans possess two KAP8 pseudogenes (see Fig. 1), the low number of functional human KAP8 members might be partially due to a species-specific inactivation of KAP8 genes during evolution. A similar type of gene loss has been described recently for the human type I keratin pseudogene hHaA, which possesses functional orthologs in the chimpanzee and gorilla (42).
Initially, the gene underlying the single human KAP7.1 protein, which shows a high sequence identity to a previously described sheep gene sequence (28), was assumed by us to represent a pseudogene due to a 1-base pair insertion into the corresponding DNA sequence on AP001709 (nucleotides 150,677), which would cause a frameshift within the region coding for the glycine-tyrosine-rich portion of KAP7.1. However, in the course of a related study in this laboratory, a KAP7.1 cDNA was isolated that contained a completely intact open reading frame (results not shown). Subsequently, cDNA/genomic analyses of four unrelated individuals by PCR clearly showed that all possessed complete open reading frames for KAP7.1. In addition, we were able to isolate a partial KAP7.1 clone from our arrayed human scalp cDNA library (see Table I) and to successfully perform in situ hybridization studies on human hair follicles using a 3Ј-noncoding region of the hKAP7.1 mRNA as a probe (Fig. 9B). Therefore, either this gene was inactivated in a minority of the human population, or the observed KAP7.1 gene frameshift on AP001709 was simply a DNA sequencing error.
The sequence comparison of the remaining human high glycine-tyrosine KAPs with hitherto undefined KAPs from other species allowed their classification into the new high glycinetyrosine KAP families 19 -22. In this context, the large KAP19 family appears especially interesting as its six human members co-segregated with two out of ten murine KAPs (af345294 and af4777980, see Figs. 2 and 6), whose genes were downregulated in hair follicles of Hoxc13-overexpressing mice (36). While two of the remaining eight murine KAPs segregated either with the KAP6 (af345298; Figs. 2 and 3) or the KAP21 family (af345297; Figs. 2 and 5), remarkably, the remaining six murine KAPs clearly built up an individual, hitherto unidentified high glycine-tyrosine KAP family, termed here the KAP18 family (Fig. 2). Although a certain degree of similarity exists between the mouse KAP18 members and the human KAP19 members (data not shown), the degree of divergence seen in Fig. 2 suggests that the murine KAP18 family represents indeed a family of its own. In a manner similar to the KAP18 family, the single KAP22.1 family member (Fig. 5) has, at present, only been observed in humans.
It should be emphasized that the analysis of the current working draft of the human genome has resulted in the identification of only one further putative high glycine-tyrosine KAP gene sequence outside of the KAP gene domain characterized here. This putative pseudogene is located on chromosome 13q14 on BAC clone AL138686 (nt 63432-63717). Therefore, we believe that the entire set of human high glycinetyrosine KAP genes are probably described here.
Remarkably, the genes encoding the seven high glycinetyrosine rich KAP families on the contiguous DNA sequences AP001708 and AP001709 are flanked by genes that code for high sulfur KAP proteins (see Fig. 1). Amino acid homology comparisons of the seven high sulfur KAPs led to their division into four individual families (KAP11, KAP13, KAP15, and KAP23), three of which possess counterparts described in other species. The largest family, KAP13, consists of four members that show a high sequence homology to the murine KAP13.1, originally designated 4C32 (16). The three remaining human high sulfur KAP families comprise only single members. Based on amino acid homology, the human KAP11.1 appears to be the ortholog of a mouse KAP protein, originally termed hacl-1, identified accidentally in the search for the 0 6 -methylguanidine-DNA methyltransferase cDNA (15). Most probably, the single human high sulfur KAP15.1 protein represents the ortholog of mouse KAP15.1 (Fig. 8), one of two related KAP proteins initially termed pmg1 and pmg2 (17), but renamed recently mKAP14.2 and mKAP15.1, respectively (see Ref. 20 for explanation), with which it co-segregates in the CLUSTREE analysis (Fig. 1). It has recently been shown that the mouse KAP14.2 and KAP15.1 genes are adjacent to each other and possess opposite directions of transcription (17). Interestingly, upstream of the human KAP15.1 gene is a putative KAP pseudogene designated KAP13A, which exhibits a fairly high homology with both mKAP14.2 and also the adjacent human KAP13 genes ( Fig. 1 and results not shown). This was confirmed by a further genome wide BLASTN search with the mKAP14.2 sequence, which revealed roughly comparable sequence homologies only with the KAP13A, hKAP15.1, and the hKAP13 genes. In addition, the human KAP13A and KAP15.1 genes/pseudogenes exhibit, like the mKAP14.2 and mKAP15.1 genes, different directions of transcription (Ref. 17 and Fig. 1). It is therefore tempting to speculate that KAP13A might represent an orthologous, but inactive, form of mKAP14.2, and the absence of further orthologs of the mK14.1 gene in the human genome data base would point to hKAP15.1 as being the only functional human ortholog of the murine KAP14.2/ KAP15.1 domain.
Chromosome 21 sequencing and its initial gene analysis by bioinformatic methods identified a surprisingly low number of 225 putative gene and 98 pseudogene loci (33). Recently, these findings have been challenged in two articles that used combinations of gene prediction and mRNA expression analysis to show, in one case, that the number of putative genes on this chromosome may be up to 10-fold higher (43,44). Our data support this assumption, for in the chromosome 21 sequencing paper, only two exons were found by gene prediction analysis on the domain described here. As such, further biochemical and molecular biological analysis of the Human Genome Programs bioinformatically generated gene prediction data appears necessary.
In general, the expression characteristics of the human high glycine-tyrosine and high sulfur KAP genes described here correspond fairly well to what has been previously found in other species. It was known from earlier protein studies that the follicular expression of high glycine-tyrosine KAPs varies considerably, being very weak in human hairs, but strong in sheep wool and mouse hairs (45). This is in line with our finding that only cDNA clones of the most strongly expressed high glycine-tyrosine KAP members 7.1, 8.1, and 19.1 (Fig. 9) resulted from the scalp cDNA library screening (Table I). Moreover, previous in situ hybridization studies of KAP6 family members in mouse, rabbit, and sheep have shown that also their expression occurs preferentially in the upper portion of the hair and wool cortex (27,29). This was also true for murine KAP8 family members (29), while human and sheep KAP8 expression begins much earlier in the matrix region (Ref. 2 and Fig. 9C). Whereas in sheep and humans, the cortical expression of KAP6 and KAP8 family members is vertically compartmentalized (Refs. 2 and 27 and Fig. 9, A and C)), this is not the case for mouse KAP6 and KAP8 members, which clearly exhibit an even cortical expression (29).
In mice, in situ hybridization studies of individual high glycine-tyrosine KAP19 (d86422 in Fig. 6), KAP20 (d88901 in Fig.  5) and KAP21 (d89902 in Fig. 5) members have shown that their expression was limited to the upper portion of the murine hair cortex (29) and, thus, corresponded to that of the human KAP19.1, KAP19.2, and KAP20.1 genes (Fig. 9, D, E, and K), however, deviated from human KAP21.2, whose expression initiated further down in the hair matrix (Fig. 9L). Surprising and at present unexplainable was the striking difference seen in the expression of KAP7.1 in scalp and beard hair follicles (Fig. 9, B and BЈ) as well as the unique cuticular KAP19.4 expression compared with the cortical expression pattern of the other KAP19 members.
The majority of the high sulfur KAP genes analyzed in this study show a generalized mRNA expression in the upper matrix and the entire hair cortex, with KAP13.2, KAP15.1, and KAP23.1 also exhibiting cuticular expression. The strongly expressed KAP11.1 displays a nearly identical expression as its mouse ortholog (15). The demonstration of cortical KAP13.1-KAP13.3 expression in the human hair follicle was surprising, for no such expression has been previously reported in mice (16). Instead, mouse KAP13.1 was found in the periderm of embryonic mice as well as the filiform tongue papillae and the parakeratotic tail epidermis of adult mice (16). To date, the analysis of KAP13.1-KAP13.3 gene expression in human tongue and periderm epithelium has been hampered by the unavailability of the tissue. The weak expression of the human KAP15.1 gene in the matrix, cortex, and cuticle of the hair follicle (Fig. 9Q) has not yet been confirmed for its animal orthologs, although a specific mouse KAP15.1 antibody is available, which has, however, only been used in Western blots to demonstrate mKAP15.1 in mouse mammary glands and epidermis (17).
The low mRNA expression of many members of the KAP families presented here leads to questions concerning the synthesis of their respective proteins. This is especially important, for data derived from recent large scale mRNA/protein studies in yeast and human liver show only a moderate correlation between mRNA transcript and protein expression (46 -48). This appears especially true for genes with low mRNA expression, for, in one study, mRNA transcripts with similar degrees of mRNA expression exhibited over 30-fold differences in protein expression (47). As such the mRNA expression data described here are no guarantee for concomitant protein expression. A detailed analysis of the KAP protein expression described here would go beyond the bounds of this paper, but will be a challenging task. For example, high glycine/tyrosine KAPs, due to their strong amino acid conservation and hydrophobic nature, are poorly amenable to the production of both pan-and individual antiodies. The small size of these proteins (Ͻ10 kDa) makes two-dimensional protein separation difficult, especially in a system that is amenable to Western blot analysis. If these handicaps can be overcome, then future analysis, possibly using mass spectrometry (matrix-assisted laser desorption ionization time-of-flight, MALDI-TOF) for protein identification, could lead to a final elucidation of KAP protein expression in the hair follicle. thank Dr. Alexander Awgulewitsch (Medical University of Charleston, Charleston, SC) for providing a previously unpublished mouse KAP 18 cDNA sequence.