Evolution of transglutaminase genes: identification of a transglutaminase gene cluster on human chromosome 15q15. Structure of the gene encoding transglutaminase X and a novel gene family member, transglutaminase Z.

We isolated and characterized the gene encoding human transglutaminase (TG)(X) (TGM5) and mapped it to the 15q15.2 region of chromosome 15 by fluorescence in situ hybridization. The gene consists of 13 exons separated by 12 introns and spans about 35 kilobases. Further sequence analysis and mapping showed that this locus contained three transglutaminase genes arranged in tandem: EPB42 (band 4.2 protein), TGM5, and a novel gene (TGM7). A full-length cDNA for the novel transglutaminase (TG(Z)) was obtained by anchored polymerase chain reaction. The deduced amino acid sequence encoded a protein with 710 amino acids and a molecular mass of 80 kDa. Northern blotting showed that the three genes are differentially expressed in human tissues. Band 4.2 protein expression was associated with hematopoiesis, whereas TG(X) and TG(Z) showed widespread expression in different tissues. Interestingly, the chromosomal segment containing the human TGM5, TGM7, and EPB42 genes and the segment containing the genes encoding TG(C),TG(E), and another novel gene (TGM6) on chromosome 20q11 are in mouse all found on distal chromosome 2 as determined by radiation hybrid mapping. This finding suggests that in evolution these six genes arose from local duplication of a single gene and subsequent redistribution to two distinct chromosomes in the human genome.

Transglutaminases (EC 2.3.2.13) are a family of structurally and functionally related enzymes that stabilize protein assemblies through the formation of intra-or intermolecular N ⑀ (␥glutamyl)lysine bonds. Enzymes of this family catalyze a Ca 2ϩdependent transferase reaction between the ␥-carboxamide group of a peptide-bound glutamine residue and various primary amines, most commonly the ⑀-amino group of lysine residues (1,2). Seven different transglutaminase gene products have previously been characterized in man (3,4) and found to have specialized in the cross-linking of proteins in different biological processes. Functions include fibrin clot stabilization in hemostasis, semen coagulation, formation of cornified envelopes in keratinization, and stabilization of extracellular matrix structures (3,5). The essential role of transglutaminases in these processes is witnessed by the serious impairment of wound healing and keratinization, which are associated with transglutaminase deficiencies (6,7). Besides playing a structural role, transglutaminase cross-linking has been shown recently to have a profound effect on cells by regulating the biological activity of signaling molecules such as transforming growth factor-␤, interleukin-2, and midkine as well as by modulating cell-matrix interactions (5). Transglutaminases can promote cell-matrix interaction through both cross-linking of cell surface-associated fibronectin (8,9) and noncovalent binding to integrins and fibronectin (10). The latter finding further supports a function of transglutaminases as structural proteins in addition to their enzymatic role and is consistent with the loss of catalytic activity of one member of the gene family, band 4.2 protein. Even though each type of transglutaminase has its own typical tissue distribution, the individual enzymes are present in a number of different tissues and often in combination with other transglutaminases. The recently reported unexpected absence of a distinct developmental phenotype in TG C 1 null mice (11) may relate to co-expression with other transglutaminases and indicates redundancy in this protein family for the first time.
All transglutaminase enzymes are encoded by a family of closely related genes. Alignment of the gene products reveals a high degree of sequence similarity, and all family members exhibit a similar gene organization with remarkable conservation of intron distribution and intron splice types. Comparison of the structure of the individual genes shows that they may be divided into two subclasses (3,12), wherein the genes encoding TG C (13), TG E (14), band 4.2 protein (15), and TG P (16) contain 13 exons, and the genes encoding factor XIII a-subunit (17) and TG K (12,18,19) contain 15 exons (for an explanation of nomenclature see Table I). Exon IX of the former group is separated into two exons (X and XI) in the TGM1 and F13A1 genes, and the nonhomologous N-terminal extensions of factor XIII a-subunit and TG K are comprised by an additional exon. Phylogenetic analysis also indicated that an early gene duplication event gave rise to two different lineages, one comprising TG C , TG E , and band 4.2 protein and the other comprising factor XIII a-subunit and TG K (12). Considering the similarity in gene structure, protein primary structure, and three-dimensional folding (20 -22) as well as catalytic mechanism (23), transglutaminase genes seem derived from a common ancestral gene, which itself is related to cysteine proteases (24). Transglutaminase genes were thought to be scattered in the human genome, because the genes encoding TG K and factor XIII a-subunit have been mapped to chromosomes 14 (18) and 6 (25), respectively, whereas TG C and TG E have been mapped to chromosome 20 (14,26), band 4.2 protein to chromosome 15 (27,28), and TG P to chromosome 3 (29,30).
We recently have isolated a cDNA encoding a novel member of the transglutaminase gene family, TG X , from human foreskin keratinocytes (4). Two related transcripts with apparent sizes of 2.2 and 2.8 kb were obtained that encoded proteins of 638 and 720 amino acids with molecular masses of 72 and 81 kDa, respectively. We now have characterized the structure of the gene encoding TG X and shown that the two previously isolated gene products result from alternative splicing of exon III. We have mapped the gene to the 15q15.2 region of chromosome 15. Analysis of its flanking sequences revealed the presence of a cluster of three transglutaminase genes within ϳ100 kb, including the gene encoding band 4.2 protein and a novel transglutaminase gene (TGM7). We have further characterized the new transglutaminase, TG Z , encoded by the TGM7 gene by determining its primary structure and tissue distribution. Our findings provide new insights into the evolution of the transglutaminase gene family.

MATERIALS AND METHODS
Reagents-Oligonucleotides were from Oligos Etc., Inc. (Wilsonville, OR) or Life Technologies, Inc., and restriction enzymes were from Promega (Madison, WI). Image clones with GenBank TM accession numbers AI018564, AI024635, and AW511368 were obtained from Genome Systems, Inc. (St. Louis, MO). Reagents for cell culture were from Life Technologies, Inc.
Genomic Library Screening-A human BAC library established in an F-factor-based vector, pBeloBAC 11, and maintained in Escherichia coli DH10B (31) was screened by PCR (Genome Systems). A 147-bp DNA fragment unique to TG X (4) was amplified from 100 ng of genomic DNA in 100 l of 10 mM Tris/HCl, pH 8.3, 50 mM KCl containing 2 mM MgCl 2 , 0.2 mM dNTPs using 2.5 units of Taq DNA polymerase (Fisher), 50 pmol of upstream primer P1 (5Ј-CCACATGTTGCAGAAGCTGAAGGCTA-GAAGC), and downstream primer P2 (5Ј-CCACATGTCCACATCACT-GGGTCGAAGGGAAGG). PCR cycles (Robocycler, Stratagene, La Jolla, CA) were 45 s at 94°C (denaturation), 2 min at 60°C (annealing), and 3 min at 72°C (elongation) for a total of 37 cycles, with the first cycle containing an extended denaturation period (6 min), during which the DNA polymerase was added (hot start), and the last cycle containing an extended elongation period (10 min). Two positive clones were identified, BAC-33(P5) and BAC-228(P20) (Genome Systems), and their identity was verified by Southern blotting. BAC plasmid DNA was prepared using a standard alkaline lysis protocol (32) with the modifications recommended by Genome Systems. 2 g of BAC plasmid DNA was restricted with BamHI, EcoRI, and SpeI and probed with a 32 P-labeled ϳ500-bp NcoI/BspHI and ϳ600-bp BspHI/NdeI cDNA fragment of TG X , respectively, as described below.
DNA Preparation and Sequencing-Plasmid DNA from BAC clones was further purified for direct sequencing by digestion with 200 g/ml RNase A (Sigma) for 1 h at 37°C and by subsequent microdialysis using Spectra/Por 2 membranes (Spectrum Medical Industries, Inc., Laguana Hills, CA). PCR products were gel-purified using the QIAquick gel extraction kit (Qiagen, Chatsworth, CA) for sequencing. Cycle sequencing was performed by the dideoxy chain termination method using the Cyclist Exo Ϫ Pfu DNA sequencing kit (Stratagene) and precast 6% polyacrylamide gels with the CastAway sequencing system (Stratagene) or using the dRhodamine Terminator Cycle Sequencing Ready reaction kit (Applied Biosystems, Warrington, UK) and an ABI 310 automated sequencer.
Rapid Amplification of 5Ј-mRNA Ends-A modified RACE protocol (4) was used to determine the transcription start site and obtain additional sequence information of exon I of TG X . Briefly, double-stranded cDNA was prepared from poly(A) ϩ RNA of cultured normal human keratinocytes (prepared as described previously (4)) with the Copy kit (Invitrogen, San Diego, CA). The cDNA was purified from nucleotides using the GlassMax DNA isolation kit (Life Technologies, Inc.) and tailed in the presence of 200 M dCTP with 10 units of terminal deoxynucleotidyl transferase (Promega) for 30 min at 37°C to anchor the PCR at the 5Ј end. The PCR was anchored by performing a total of 5 cycles of one-sided PCR at a lower annealing temperature (37°C) with the abridged anchor primer (Life Technologies, Inc.) only. After the transfer of 25% of this reaction at 94°C to a new tube containing abridged anchor primer and TG X -specific primer P3 (see above), the first round of amplification was carried out for a total of 37 cycles under the conditions described above except for annealing at 55°C. Nested PCR was done with the universal amplification primer (Life Technologies, Inc.) and TG X -specific primer P4 (5Ј-TGAAGTACAGGGTGAGGT-TGAAGG) as described above (annealing at 60°C) using 1 l of the first round PCR. Primer Extension Analysis-Oligonucleotide P5, 5Ј-CATGGTAGCT-GCCTCCGGTTCCTG, containing a 5Ј-infrared label (IRD 800) was purchased from MWG Biotech (Ebesberg, Germany). Primer P5 (5.3pmol) was hybridized to 1 g of poly(A) ϩ RNA from primary keratinocytes (4), and reverse transcription was performed with 200 units of Superscript II RNase H Ϫ reverse transcriptase (Life Technologies, Inc.) in a total volume of 20 l for 90 min at 42°C according to manufacturer instructions. The enzyme was heat-inactivated, and the primer extension products were extracted with phenol chloroform, precipitated with ethanol, and then analyzed on a 4.5% denaturing polyacrylamide gel adjacent to dideoxynucleotide chain termination sequencing reactions (Thermo Sequenase cycle sequencing kit, Amersham Pharmacia Biotech) derived from a double-stranded genomic DNA fragment using the same primer.
Southern Blotting-18 g of human genomic DNA was digested with BamHI, EcoRI, and HindIII restriction enzymes, separated in a 0.8% agarose gel, and transferred to a Zeta-probe membrane (Bio-Rad). The gel was calibrated using the Lambda DNA/HindIII markers (Promega). 32 P-labeled probes were prepared by random prime labeling using the Multiprime DNA labeling system (Amersham Pharmacia Biotech), and PCR products corresponding to intron 2 and exon X were used as DNA templates. Probes were hybridized to the blot overnight at 65°C in 500 mM NaH 2 PO 4 , pH 7.5, containing 1 mM EDTA and 7% SDS. The membrane was washed at 65°C to a final stringency of 40 mM NaH 2 PO 4 , pH 7.5, 1 mM EDTA, and 1% SDS, and the result was developed by exposure of the membrane to BioMax MR film (Eastman Kodak Co.).
Chromosomal Localization-Human peripheral blood lymphocytes were used to prepare metaphase chromosome spreads (33). Briefly, cells were cultured in PB-Max karyotyping medium (Life Technologies, Inc.) for 72 h and synchronized by culture in the presence of 10 Ϫ7 M amethopterin (Fluka, Buchs, Switzerland) for another 24 h. Cells were released from the mitotic block by extensive washing and subsequent culture in the above medium containing 10 Ϫ5 M thymidine for 5 h. The cells were subsequently arrested in metaphase by the addition of colcemid to a final concentration of 0.1 g/ml (Life Technologies, Inc.). Harvested cells were incubated in 0.075 M KCl for 25 min at 37°C and fixed in methanol/acetic acid (3:1) solution, and chromosome spreads were prepared by dropping the cells onto the glass slides. After air drying, chromosomes were treated with 100 g/ml RNase A in 2ϫ standard saline citrate for 1 h at 37°C, denatured in 70% (v/v) formamide in 2ϫ standard saline citrate for 3 min at 75°C, and dehydrated in a graded ethanol series. DNA probes were prepared by random prime labeling of plasmid DNA of BAC-33(P5) and BAC-228(P20) with fluorescein-conjugated dUTP using the Prime-It Fluor fluorescence labeling kit (Stratagene). The probes were denatured at 75°C for 10 min in hybridization buffer consisting of 50% formamide (v/v) and 10% dextran sulfate (w/v) in 4ϫ standard saline citrate and prehybridized at 42°C for 20 min to 0.2 g/ml human competitor DNA (Stratagene) to block repetitive DNA sequences. The probes were subsequently hybridized to the chromosome spreads at 37°C overnight, followed by washing to a final stringency of 0.1ϫ standard saline citrate at 60°C. Spreads were mounted in phosphate-buffered glycerol containing 200 ng/ml propidium iodide to counterstain chromosomes. Slides were examined by epifluorescence microscopy using an ϫ100 objective, and the images were captured with a DC-330 charge-coupled device camera (DAGE-MTI, Inc., Michigan City, IN) using an LG-3 frame grabber board (Scion Corp., Frederick, MD) in a Macintosh 8500 work station and a modified version of the NIH Image 1.6 software (Scion Corp.). Images representing fluorescein labeling and propidium iodide staining of the same field were superimposed using Adobe PhotoShop 3.0 (Adobe Systems, Inc., Mountain View, CA) to map the gene to a chromosomal region.
Genomic Organization of the EBP42, TGM5, and TGM7 Genes-PCR amplification of band 4.2 sequences was performed as described for the genomic library screening but using 100 ng of BAC plasmid DNA and the oligonucleotides 5Ј-GTTCTAGGCTTCTCTAGTTGGCAGG (forward) and 5Ј-CGCTGGCTTGGCTCACCCTGTCCC (reverse) for 5Ј-UTR/exon I or 5Ј-CCTGAAAACACCATGTGTGCCAAG (forward) and 5Ј-GTTCAGGGGCTACCACGGTGACGC (reverse) for exon XIII. Southern blots of BAC plasmid DNA restricted with BamHI, EcoRI, and SpeI were probed with the respective 32 P-labeled fragments of band 4.2 protein as described and compared with those probed for TG X . Long range genomic PCR in combination with direct sequencing from the BAC clones was used to isolate the sequence between the EPB42 and TGM5 genes. PCRs were performed with Pfu Turbo DNA polymerase as described above but using 5% Me 2 SO and the primers 5Ј-GATTCCAC-TGTGTCCCATCCAGA (forward) and 5Ј-GTCACCTCTCAACAGGACA-AGGG (reverse) and 5Ј-GGGCCATTACCCTAGTCTCTTATTG (forward) and 5Ј-TAATAAAGTGTGACCAGCCTTCCTAG (reverse). A total of 36 PCR cycles were carried out with a combined annealing extension step at 68°C for 10 min and denaturation at 94°C for 45 s. The location of the TGM7 gene in respect to the TGM5-EPB42 gene cluster was confirmed in a similar manner using oligonucleotides 5Ј-TGGGCAAG-GCGCTGAGAGTCCATG (forward, P7) and 5Ј-GTTTACCTGTCTGCC-TCTACGCTG (reverse) and the Taq-Plus Long PCR system (Stratagene) according to manufacturer instructions.
Cloning of TG Z by Anchored PCR-PC-3 cells (European Collection of Cell Cultures, Salisbury, UK), established from a human prostate adenocarcinoma, were cultured in Coon's modified Ham's F-12 medium (Sigma) containing 10% (v/v) heat-inactivated fetal bovine serum, 100 units/ml penicillin, and 100 g/ml streptomycin. Total RNA was isolated using Tri Reagent™ (Sigma) according to manufacturer instructions. When needed, poly(A) ϩ RNA was prepared using the Micro-Fast Track 2.0 kit (Invitrogen). cDNA was synthesized from 2 g of total RNA or 500 ng of poly(A) ϩ RNA by reverse transcription for 1 h at 42°C using 200 units of Superscript II RNase H Ϫ reverse transcriptase and 0.025 g/l oligo(dT) 15 primer in a 20-l reaction mixture containing 0.5 mM dNTPs and 10 mM dithiothreitol in first strand buffer. For the cloning of TG Z , we used a series of degenerate and gene-specific oligonucleotides to isolate overlapping DNA fragments, essentially following our previously described strategy (4). Isolated and sequenced TG Z fragments in a 5Ј33Ј direction were obtained with the following primers: 5Ј-CAACCTTGCGGCTTGAGTCTGTCG (forward, P8) and 5Ј-CAGCA-GCTCTGACGGCTTGGGTC (reverse, P9); P8 and 5Ј-CATACACCACG-TCGTTCCGCTG (reverse, P10), with nested reactions 5Ј-ATCACCTT-TGTGGCTGAGACCG (forward) and 5Ј-CAAGGTTAAAAAGTAGGAT-GAAAGTTC (reverse) and 5Ј-CACAGTGTGACTTACCCGCTG (forward, P11) and P10; P11 and 5Ј-CGATGGTCAAGTTCCTATCCAGTTG (reverse, P12), with nested reaction 5Ј-CTTAAAGAACCCGGCCAAAGA-CTG (forward) and P12; 5Ј-TGTTGTTTCCAATTTCCGTTCCGC (forward) and 5Ј-TCTGGCACCCTCTGGATACGCAG (reverse); 5Ј-CTTAG-GGATCAGCCAGCGCAGC (forward, P13) and 5Ј-GGGTGACATGGAC-TCTCAGCG (reverse, P14), with nested reactions P13 and 5Ј-TATCT-TTTAGGACCAGCATGGACCTC (reverse) and 5Ј-GCGGATGAACCTG-GACTTTGG (forward) and P14; and 5Ј-TGGGCAAGGCGCTGAGAGT-CCATG (forward) and 5Ј-AGGACAGAGGTGGAGCCAAGACGACATA-GCC (reverse). PCRs were performed with 2 l of the reverse transcription reaction using 1.25 units of AmpliTaq Gold DNA polymerase (Applied Biosystems) and the respective buffer supplemented with 2 mM MgCl 2 , 0.2 mM dNTPs, and 25 pmol of each primer in a total volume of 50 l. 40 PCR cycles were carried out in a GeneAmp 9600 thermalcycler (Applied Biosystems), each cycle consisting of denaturation at 95°C for 45 s, annealing at 60°C for 1 min, and extension at 72°C for 1 min, with the first cycle containing an extended denaturation period (10 min) for the activation of the polymerase and the last cycle containing an extended elongation period (10 min). The 5Ј end of the cDNA was isolated by 5Ј-RACE as described above with the exception of using the gene-specific oligonucleotides P9, 5Ј-TGAAGCTCAGCCGGAGGTA-GAAG, and 5Ј-GACAGACTCAAGCCGCAAGGTTG. Amplified products were analyzed on 1% agarose gels, extracted using the QIAquick gel extraction kit, and sequenced as described.
Northern Hybridization-A human RNA master blot containing poly(A) ϩ mRNA of 50 different tissues was obtained from CLONTECH. 32 P-labeled probes were prepared by random prime labeling of DNA fragments of the different transglutaminase gene products using the Multiprime DNA labeling system. DNA fragments of 500 -700 bp comprising the 3Ј end of TG X , TG Z , and band 4.2 protein were generated by restriction with PstI and AccI, NcoI and NotI (exons XII and XIII), and XhoI, respectively. The cDNA encoding human band 4.2 protein (34) was provided kindly by Dr. Carl M. Cohen (Boston, MA). Hybridization was performed under the conditions recommended by the manufacturer. The labeled membrane was exposed to BioMax MR film, and the films were developed after 15-24 h for first exposure and 3-5 days for second exposure.
Amplification of TG Z from Different Tissues-cDNA from various cell lines and human tissue was prepared as described previously (4). A panel of cDNAs from human tissue (multiple tissue cDNA panel I) was obtained from CLONTECH. A 365-or 287-bp fragment of TG Z was amplified by PCR using the oligonucleotides 5Ј-TGGGCAAGGCGCT-GAGAGTCCATG (forward) and 5Ј-GCTGGAGGGCGGGTCTCAGG-GAGC (reverse) or 5Ј-AGGACAGAGGTGGAGCCAAGACGACATAGCC (reverse), respectively, with an annealing temperature of 60°C.
Mapping of Transglutaminase Genes in Mouse Genome-The 100 radiation hybrid clones of the T31 mouse/hamster radiation hybrid panel (35) (Research Genetics, Huntsville, AL) were screened by PCR. A 139-bp fragment of the tgm5 gene was amplified with primers 5Ј-TGAGGACTGTGTGCTGACCTTG (forward) and 5Ј-TCCTGTGTCTG-GCCTAGGG (reverse), a 149-bp fragment of the epb42 gene with primers 5Ј-CAGGAGGAGTAAGGGGAATTGG (forward) and 5Ј-TGCAGGC-TACTGGAATCCACG (reverse), a 400-bp fragment of tgm7 with primers 5Ј-GGGAGTGGCCTCATCAATGG (forward) and 5Ј-CCTTGACCT-CACTGCTGCTGA (reverse), a ϳ600-bp fragment of tgm3 with primers 5Ј-TCGGTGGCAGCCTCAAGATTG (forward) and 5Ј-AGACATCAATG-GGCAGCATGG (reverse), and 655-and 232-bp fragments of tgm2 with primers 5Ј-TTGGGGAGCTGGAGAGCAAC (forward) and 5Ј-ATCCAG-GACTCCACCCAGCA (reverse) and primers 5Ј-(GCGGCCGCTAGT)C-CACATTGCAGGGCTCCTGAC (forward) and 5Ј-GCTAGCCTGTGCT-CACCATGAGG (reverse), respectively. PCRs were carried out in a GeneAmp 9600 thermalcycler with 0.035 units/l AmpliTaq Gold polymerase in standard reaction buffer containing 2 mM MgCl 2 , 0.2 mM dNTPs, 0.4 M of each primer, and 2.5 ng/l genomic DNA in a total reaction volume of 25 l. PCR conditions were: polymerase activation for 10 min at 95°C, annealing at 60°C for 45 s, extension at 72°C for 1 min, and denaturation at 94°C for 30 s for 35 cycles with a final extension of 3.5 min at 72°C. PCRs were analyzed by agarose gel electrophoresis using 1 or 1.5% gels. The hybrid cell panel was analyzed at least twice in each case to exclude PCR-related errors. The data were submitted to the Jackson Laboratory radiation hybrid data base for analysis and mapped relative to known genomic markers (www.jax.org/resources/documents/cmdata/rhmap).

Structure of the TGM5 Gene
Isolation of Genomic Clones-A unique insertion of ϳ30 amino acids between the catalytic core domain and ␤-barrel domain 1 found in TG X served as a template to design specific primers for screening of a human genomic library. The characterization of several genes of the transglutaminase gene family showed that the positions of the introns have been highly conserved (12, 13-15, 17-19, 36), and a comparison of the TG X sequence to the sequences of the other transglutaminases indicated that this unique sequence is present within an exon, exon X (see Fig. 7, amino acids 460 -503 in TG X ). A PCR from human genomic DNA using oligonucleotides P1 and P2, which match sequences at either end of this unique segment, yielded a DNA fragment of the expected size that was confirmed to be the correct product by sequencing (results not shown). Screening of a human genomic DNA BAC library by PCR using these oligonucleotides revealed two positive clones, BAC-33(P5) and BAC-228(P20), which were subsequently shown by Southern blotting with different cDNA probes to contain sequences spanning at least exon II to exon X of the TGM5 gene (results not shown). Restriction analysis further indicated that each of the BAC clones contained substantially more than 50 kb of human genomic DNA.
Gene Structure-The similarity in the gene structure of the different transglutaminase genes prompted us to approach the characterization of introns by PCR amplification using oligonucleotide primers corresponding to the flanking exon sequences at the presumptive exon/intron boundaries. All intron/ exon boundaries were sequenced from the PCR products obtained in at least two independent PCRs, where applicable from both BAC clones, to exclude mutations introduced by Taq DNA polymerase, and the results were compared. When sequences of PCR products comprising adjacent introns had no overlap, the intervening sequence (exon sequence) was determined by direct sequencing from isolated BAC plasmid DNA to confirm the absence of additional introns. Similarly, the 3Јuntranslated region was obtained by stepwise extension of the known sequence using direct sequencing of BAC plasmid DNA. Both BAC clones terminated short of exon I, and all attempts at isolating clones spanning exon I by the screening of BAC, P1-derived artificial chromosome, and P1 libraries with a cDNA probe or by PCR failed. Exon I and intron 1 sequences were finally derived by nested PCR from human genomic DNA using conditions optimized for long range genomic PCR. We established that the TGM5 gene is comprised of ϳ35 kb of genomic DNA and contains 13 exons and 12 introns (Fig. 1). All intron/exon splice sites conformed to the known GT/AG donor/acceptor site rule and essentially to the consensus sequence proposed by Mount (37) (Table II). A sequence homologous to the branch point consensus CTGAC (38) was found 24 -44 nt upstream of the 3Ј splice site in introns 1, 3-6, and 9 -12. The size of the introns varied considerably, ranging from 106 bp to more than 11 kb (Fig. 1, Table III). The sequences obtained from the two different BAC clones matched with the exception of a deletion spanning the sequence from introns 6 to 8 found in BAC-33(P5) (Fig. 1).
Allelic Variants-During the course of this work, we also resequenced the entire coding sequence of TG X and found three point mutations as compared with the previously reported cDNA sequence (4). One of the nt exchanges is silent, and the other two result in an amino acid exchange (Table IV). The first two mutations were found in both BAC clones, and the third was present only in BAC-228(P20) because of the deletion in the other BAC clone. These differences might be sequence polymorphisms in the human gene pool, because there was no ambiguity of the cDNA-derived sequence in this position determined from multiple independently amplified PCR products (4). However, the fact that a serine and an alanine residue are changed into a proline and a glycine residue that constitutes the conserved amino acid in these positions in the transglutaminase protein family (see Fig. 7, amino acids 67 and 352 in TG X ) suggested that these may have been PCR-related mutations in the cDNA sequence. To clarify this issue, we have prepared cDNA from human foreskin keratinocytes from different individuals, amplified full-length cDNA with high fidel- ity DNA polymerase, and sequenced the respective portions of the cloned cDNAs. The data confirmed that allelic variants exist with differences in these positions (Table IV).
Alternative Splicing-Isolation and sequencing of cDNAs encoding TG X and Northern blotting with TG X cDNA probes revealed expression of at least two differentially spliced mRNA transcripts for TG X in human keratinocytes (4). Solving the gene structure confirmed the short form of TG X to be the result of alternative splicing of exon III as predicted. A third isolated cDNA that differed also at the exon III/exon IV splice junction turned out to be the result of incomplete or absent splicing of intron 3, because the sequence upstream of exon IV in the cDNA matched with the 3Ј sequence of intron 3. Exon 3 encodes part of the N-terminal ␤-barrel domain of TG X , and the absence of the sequence encoded by exon 3 is expected to result in major structural changes in at least this domain of the protein. Nevertheless, the expression of TG X in 293 cells using the fulllength cDNA resulted in synthesis of two polypeptides with a molecular weight consistent with that of the predicted products from the alternatively spliced transcripts (results not shown).
5Ј-Untranslated Region-Initially, 5Ј-RACE was used to determine the 5Ј end of TG X cDNAs. Transcripts starting at 77, 96, and 157 nt upstream of the initiator ATG were isolated in addition to the previously described shorter transcript (Fig. 2B,  arrowheads). All of these transcripts were recovered repeatedly in independent experiments. Finally, primer extension experiments located the major transcription initiation site used in keratinocytes 157 nt upstream of the translation start codon ( Fig. 2A). The proximal promoter region was analyzed for potential binding sites of transcription factors using MatInspector (Genomatix, Munich Germany) and GCG (Genetics Computer Group, Inc., Madison, Wisconsin) software packages. No classical TATA-box sequence was found, but a number of other potential transcription factor binding sites could be identified (Fig. 2B), suggesting that the TG X promoter is a TATA-less promoter. Interaction of CCAAT/enhancer-binding protein (may bind to CAAT box), nuclear factor I (NF1), and upstream stimulatory factor (USF) to form a core proximal promoter has been demonstrated in a number of TATA-less genes. c-Myb is found in TATA-less proximal promoters of genes involved in hematopoiesis and often interacts with Ets factors (39), and these sites may be operative in the expression of TG X in hematopoietic cells, e.g. HEL cells. AP1, Ets, and SP1 elements are typically found in keratinocyte-specific genes (40,41) and may be involved in transcriptional regulation in keratinocytes. Several AP1 sites are present within 2.5 kb of upstream sequence and could interact with the proximal AP1 factor for activation. SP1 sites are positioned properly upstream of the start points of the shorter transcripts, raising the possibility that these could also be functional in transcription initiation, although to a lesser degree.
3Ј-Untranslated Region-The last exon, exon XIII, contained a consensus polyadenylation signal AATAAA ϳ600 bp downstream of the termination codon (Fig. 3). This is in good agree- CAGCAGAAAGTCTTgtaagtgctgcaagtgctcagccttctcct ttttctgacatgctccattctctgttgcagCCTTGGAGTCCTCAAA  ment with the size of the mRNA (2.8 kb) encoding full-length TG X expressed in human keratinocytes as detected by Northern blotting (4) considering the length of the coding sequence (2160 bp). A CAYTG signal that binds to U4 small nuclear RNA (42), which is identical for 4 of 5 nt, is present in tandem in three copies 7 nt downstream of the polyadenylation signal. A close match (YCTGTTYY) of another consensus sequence YGT-GTTYY that is found in many eukaryotic transcripts and provides a signal for efficient 3Ј processing (43) is present 46 nt downstream of the polyadenylation signal. However, we have reported previously that all cDNAs isolated by reverse transcription PCR with an oligo(dT) oligonucleotide from human keratinocytes ended within 9 -34 nt downstream of the pentanucleotide ATAAA at position 2317 ( Fig. 3

) (4). It has been
shown that this pentanucleotide functions as a polyadenylation signal in other genes (42), and it is apparently functional in TG X . Indeed, the most frequently found poly(A) addition sites (indicated in Fig. 3) showed cleavage at a C(A) boundary, which is the preferred sequence for cleavage/polyadenylation in eukaryotic genes (44). However, no CAYTG or YGTGTTYY signal was found immediately downstream of the ATAAA signal sequence. A possible explanation for these findings is that most transcripts use the AATAAA signal for polyadenylation, as suggested by the Northern blotting data, and occasionally the ATAAA signal functions as a polyadenylation signal and these shorter transcripts are selectively enhanced by PCR amplification because of the smaller size of the PCR product.

The TGM5 Gene Is Part of a Cluster of Transglutaminase Genes
Chromosomal Localization of the TGM5 Gene-To address the genomic organization of the TGM5 gene or genes in the human genome, we performed Southern blot analysis of human genomic DNA cut with BamHI, EcoRI, and HindIII restriction enzymes using probes derived from intron 2 as well as from the sequence encoded by exon X that is unique to TG X . Bands of 4.5, 6.0, and 10.5 kb and 4.3, 9.3, and 2.6 kb were revealed with the respective probes. The simple pattern of restriction fragments hybridizing with the probes indicated that the haploid human genome contains only one TGM5 gene.
The TGM5 gene was subsequently localized to chromosome 15 by fluorescent in situ hybridization on human metaphase chromosome spreads using genomic DNA derived from either BAC clone as a probe (Fig. 4A). A comparison of the probe signal to the 4,6-diamino-2-phenyl indole (DAPI) banding pattern localized the TGM5 gene to the 15q15 region. The localization was subsequently refined by determining the distance of the fluorescent signal to the centromere as well as to either end of the chromosome on 13 copies of chromosome 15 and expressing it as a fractional distance of the total length of the chromosome. These measurements placed the TGM5 gene close to the center of the 15q15 region, i.e. to the 15q15.2 locus (Fig. 4B).
TGM5 and EPB42 Genes Are Arranged in Tandem-The EPB42 gene (Table I) has been assigned previously to the 15q15 locus on chromosome 15 (27,28). This raised the possibility that the EPB42 gene is in close proximity to the TGM5 gene. Indeed, PCR with specific primers for sequences derived from the 5Ј and 3Ј ends of the EPB42 gene yielded products of appropriate size from both, BAC-33(P5) and BAC-228(P20) (data not shown), and sequencing confirmed the identity of the PCR products. Southern blotting of BAC plasmid DNA with cDNA probes comprising the 5Ј or 3Ј end of the EPB42 gene and subsequent comparison of the pattern of labeled restriction fragments with that of the TGM5 gene allowed us to map this locus in more detail. The EPB42 and TGM5 genes are arranged in the same orientation, being spaced apart by ϳ11 kb (Fig. 4C).
Identification of a Novel Transglutaminase Gene, TGM7-We developed a method for the detection and identification of transglutaminase gene products based on reverse transcription PCR of the conserved active site sequence with degenerate primers, and using this method we discovered the gene product of the TGM5 gene (4). Using this same method, we have now identified another new transglutaminase gene product in human prostate carcinoma tissue that we designated TG Z . The full-length cDNA for this new gene product was obtained by anchored PCR as described below. A comparison of the TG Z cDNA sequence to the sequences in the Genbank TM data base revealed a match with two draft sequences of human BAC clones (AC009852 (assigned to chromosome 15) and AC009825 (assigned to chromosome 8)). These DNA contingencies also gave a perfect match with our TGM5 gene sequences, suggesting that the TGM7 gene is located in close proximity to the TGM5 and EPB42 genes. Although the human BAC clones (Fig. 1) terminate within the TGM5 gene and do not contain sequences of the TG Z gene product, we have isolated a mouse BAC clone that contains all three genes (data not shown). To further confirm the organization of the TGM7 gene in the human genome, we used long range genomic PCR with different combinations of primers designed from the flanking sequences of the TGM5-EPB42 gene and the TG Z cDNA sequence to explore whether the TGM7 gene was present in close proximity to the other two transglutaminase genes. This placed the TGM7 gene ϳ9 kb upstream of the TGM5 gene and demonstrated that the genes are arranged in tandem fashion (Fig.  4C). The sequence between the TGM7 and TGM5 genes was further found to contain a pseudogene of the mitochondrial ATPase D subunit, the relevance of which is unclear.

Determination of the cDNA and Amino Acid Sequence of the TGM7 Gene Product
Based on the initial PCR amplification of a TG Z cDNA fragment from prostate carcinoma tissue, we analyzed prostatederived cell lines for expression of TG Z by PCR. A full-length cDNA sequence for TG Z was obtained by anchored PCR using cDNA prepared from the human prostate carcinoma cell line PC-3, essentially following the strategy described previously (4). 5Ј-RACE was used to determine the 5Ј end of the cDNA. The isolated 3Ј-end sequence was confirmed by sequencing of matching expressed sequence tag clones in the Genbank TM data base (accession numbers AI018564, AI024635, and AW511368). The obtained sequence information consists of  L1 Sub D)), an L1 repetitive element, and genetic markers (for L926G10 see Footnote 3). The probable initiation codon is present in the sequence GAGATGG, which presents only limited homology to the consensus identified by Kozak (45), which acts as a signal for efficient transcription in eukaryotes. However, the critical purine in position Ϫ3 as well as the G in position ϩ4 are conserved. The signal for polyadenylation (AATAAA) is located 158 nt downstream of the termination codon (TGA). The deduced protein consists of 710 amino acids and has a calculated molecular mass of 80,065 Da and an isolelectric point of 6.6.
During the course of the TGz cDNA sequence determination, a number of aberrantly spliced gene products were isolated. These lacked exon VI or part of exon IX (5Ј end) or retained the whole or part of intron 11. These products are unlikely to be of physiological significance but may point out that the splicing of certain introns in this gene is a difficult and inefficient process. Interestingly, in the variant lacking part of exon IX, splicing occurred from the donor site of exon VIII to an acceptor site in the middle of exon IX, which corresponds to the exact position where an additional intron is present in the TGM1 and F13A1 genes (Fig. 7).

Gene Products of the Transglutaminase Gene Cluster Are Differentially Expressed in Tissues
We have shown previously that TG X is expressed in a number of different cell types (4). To obtain a more complete picture of the expression pattern of TG X and the novel gene product, we performed a dot-blot Northern blot analysis of more than 50 adult and fetal human tissues. Probes comprising sequences of the ␤-barrel domains were employed, the specificity of which has been documented previously (4). Band 4.2 protein was expressed at high level in bone marrow and fetal spleen and fetal liver, consistent with its role in hematopoietic cells, and virtually undetectable in all other tissues (Fig. 6). In contrast, TG X and TG Z showed widespread expression at low level. The highest levels of TG X and TG Z mRNA were present in the female reproductive system (uterus, placenta, ovaries, and mammary gland) and fetal tissues, and in testis and lung, respectively (Fig. 6). Reverse transcription PCR analysis on human cell lines and tissues confirmed these results and demonstrated that TG Z is expressed in osteosarcoma cells (MG-63), dermal fibroblasts (TJ6F, HCA2), erythroleukemia cells (HEL), primary keratinocytes, mammary epithelium carcinoma cells (MCF7), HELA cells, testis, skin, brain, heart, kidney, lung, pancreas, placenta, skeletal muscle, fetal liver, prostate, and prostate carcinoma tissue (data not shown). These results demonstrate that all three gene products of this gene cluster are differentially expressed in tissues.

Evolution of Transglutaminase Genes
Mapping of Mouse Genes-To further analyze the relationship between the most closely homologous genes of the transglutaminase gene family (TGM7, TGM5, and EPB42 on human chromosome 15q15.2 and TGM2 and TGM3 on chromosome 20q11/12), we mapped the respective mouse genes using radiation hybrid mapping. All genes mapped to the distal part of mouse chromosome 2. The genes for tgm7, tgm5, and epb42 showed a best-fit location for the segment defined by D2Mit104 proximal and D2Mit396 distal, with a highest LOD of 14.5, 16.6, and 15.5 to the anchor marker D2Mit395 (66.9 cM) and an LOD of Ͼ20 to D2Ertd616e (69.0 cM). This is in good agreement with the assigned locus of epb42 67.5 cM distal from the centromere (46). The tgm3 gene showed a best-fit location for the segment defined by D2Mit447 proximal and D2Mit258 distal, with a highest LOD of 14.8 and 12.2 to D2Mit258 (78.0 cM) and D2Mit338 (73.9 cM), respectively. The tgm2 gene showed a best-fit location for the segment defined by D2Mit139 proximal (86.0 cM) and D2Mit225 distal (91.0 cM), with a highest LOD of 17.0 to the anchor marker D2Mit287, consistent with its assigned locus 89.0 cM from the centromere (47). DISCUSSION In this study, we have shown that human TG X is the product of an ϳ35-kb gene located on chromosome 15q15.2 and containing 13 exons and 12 introns. The intron splice sites conform to the consensus for splice junctions in eukaryotes (37). The transcription initiation site is localized 157 nt upstream of the initiator methionine, and the likely polyadenylation site is localized ϳ600 nt downstream of the stop codon, which is consistent the previously reported transcript size of about 2.8 kb (4). The short form of TG X , which we have identified in keratinocytes (4), is the product of alternative splicing of exon III. The TGM5 gene is part of a cluster of three transglutaminase genes arranged in tandem, with a novel transglutaminase gene proximal to the TGM5 gene and the gene encoding erythrocyte band 4.2 protein distal. We have used the designation TGM7 for the novel gene and TG Z for its gene product, which is in line with using sequential numbers and letters for genes and proteins, respectively, in the order of discovery.
Structural and Functional Implications for the Novel Transglutaminase Gene Product Based on Comparative Sequence Analysis-We have determined the full-length cDNA sequence of the novel transglutaminase gene product, TG Z . A comparison of TG Z to the previously characterized human transglutaminases reveals that the structural requirements for transglutaminase activity and Ca 2ϩ binding are conserved. The structure of several transglutaminases has been solved by x-ray crystallography and shows a high degree of similarity (20 -22). The reaction center is formed by the core domain and involves hydrogen bonding of the active site Cys to a His and Asp residue to form a catalytic triad reminiscent of the Cys-His-Asn triad found in the papain family of cysteine proteases (23). The residues comprising the catalytic triad are conserved in TG Z (Cys 279 , His 338 , and Asp 361 ) (Figs. 5  and 7), and the core domain shows a high level of conservation as indicated by a sequence identity of ϳ50% as compared with the other transglutaminases. A Tyr residue in the barrel 1 domain of the a-subunit of factor XIII is hydrogen-bonded to the active site Cys residue, and it has been suggested that the glutamine substrate attacks from the direction of this bond to initiate the reaction based on analogy to cysteine proteases (24). It has further been proposed that this hydrogen bond prevents the formation of a disulfide bond between the active site residue and a neighboring Cys, which would result in enzyme inactivation (22). In TG Z , the Tyr residue has been replaced by a His residue (His 538 ), similar to TG X (Fig. 7). This is expected to be a conservative change, which is supported by our data demonstrating that recombinant TG X produced in 293 cells has transglutaminase activity. 2 The Trp residue (Trp 279 in factor XIII), which is thought to stabilize the oxyanion intermediate generated in the proposed reaction mechanism (23), is conserved in TG Z (Trp 243 ). All these residues involved in the catalytic process are conserved in the different transglutaminase gene products with the exception of band 4.2 protein, which is the only member of this gene family without catalytic activity (Fig. 7). Crystallization experiments with factor XIIIa further indicated that four residues are involved in the binding of a Ca 2ϩ ion, including the main chain carbonyl of Ala 457 and the side chain carboxyl groups of Asp 438 , Glu 485 , and Glu 490 (24). All three acidic residues are conserved in TG Z (Asp 403 , Glu 450 , and Glu 455 ). Based on the preservation of critical residues for enzyme function and domain folding and the extensive overall similarity of TG Z to the other members of the transglutaminase family with catalytic activity, it is likely that the characterized cDNA encodes an active transglutaminase.
Organization of Transglutaminase Genes in the Human and Mouse Genome-Having established the chromosomal localization of the TGM5 and other human transglutaminase genes, it is of interest to compare it with the equivalent chromosomal regions in the genomes of other species and in particular of the mouse, for which many genetic aberrations have been mapped.
This may provide insights into the evolution of a gene family as well as help in the identification of candidate genes for human pathologies.
In this study, we have mapped the gene encoding TG X to chromosome 15q15 by fluorescent in situ hybridization. Band 4.2 protein has been mapped previously to this chromosomal region (27,28) and subsequently has been assigned to 15q15.2 by expression mapping of the LGMD2A locus on chromosome 15 (48). A short sequence encompassing the left arm of one of the yeast artificial chromosome clones (926G10) 3 used for expression mapping matched with the sequence of intron 12 of the TGM5 gene, which is consistent with our data placing the two genes encoding TG X and band 4.2 protein in direct apposition. In mouse, the syntenic region is found on chromosome 2, 2F1-F3 (48,49), and our radiation hybrid data place all three genes, tgm7, tgm5, and epb42, in this segment of chromosome 2. In addition, we have isolated a mouse BAC clone and shown that it contains all three genes. Even though the exact positioning awaits further sequence data, this suggests that all three mouse genes are in close proximity, presumably arranged in a similar fashion to their human counterparts. Our mapping data are consistent with the previously determined location of the epb42 gene by interspecific backcross analysis (46). Data base analysis identified two mutations that are associated with 3 (34); TG E (54); TG C (55); factor XIII a-subunit (56,57); TG K (58, 59); and TG P (30,60)) is shown, with dashes indicating gaps inserted for optimal sequence alignment and underlined residues representing amino acids conserved in at least four gene products. The sequences are arranged to reflect the transglutaminase domain structure based on the crystal structure of factor XIII a-subunit (20): N-terminal propeptide domain (d1), ␤-sandwich domain (d2), catalytic core domain (d3), and ␤-barrel domains 1 (d4) and 2 (d5) (from top to bottom). Known intron splice sites are marked by arrowheads (12)(13)(14)(15)(16)(17)(18)(19). The aberrant splice donor site within exon IX of TG Z is marked by a black square.
this locus, ro (rough) and pa (pallid). In fact, it had been suggested that pa might be a mutation in band 4.2 protein (46), but more recently it has been shown that pa/pa mice express normal band 4.2 by cDNA analysis and that the two loci segregate in an interspecific cross (50). The phenotype associated with neither of these mutations seems to be a match with expected tissue impairments of a transglutaminase deficiency. Similarly the pathologies of known congenital human diseases associated with the respective locus are not consistent with a transglutaminase deficiency.
Interestingly, the syntenic region of human chromosome 20q11 is found on distal mouse chromosome 2 adjacent to the FIG. 8. Phylogenetic tree of the transglutaminase gene family and genomic organization of the genes in man and mouse. Phylogenetic trees based on the amino acid sequence homology of the gene products have been constructed for the individual domains as well as for the entire gene products using the neighbor-joining method (61) of the PHYLIP or Treecon 1.3b (van de Peer, University of Antwerp, Belgium) software packages. Sequences were aligned using ClustalX and corrected by hand as needed (nonhomologous N-and C-terminal extensions were excluded) to maximize homology as shown in Fig. 7 except including sequences from different species as available. A representative tree based on fulllength sequences is shown in A (Treecon, distance estimation according to Tanja and Nej). Bootstrap values are given in percentage and the scale bar reflects 0.05 substitutions per site. In B, a hypothetical pedigree for the gene family is given that is consistent with the data on the sequence relationship of the individual gene products (A) as well as with the data on gene structure and genomic organization (C). locus of the identified transglutaminase gene cluster. The genes encoding TG C and TG E , which are more closely related to the TGM5, TGM7, and EPB42 genes than the other transglutaminase genes based on amino acid sequence comparison and similarity in gene structure (Fig. 7), have been mapped to human chromosome 20q11 (14,26). We recently have cloned TG Y , which is the product of the TGM6 gene that neighbors the TGM3 gene. 4 We have now identified the chromosomal location of the mouse tgm2 and tgm3 genes by radiation hybrid mapping and identified positions ϳ20 and 5 cM distal to the tgm7/ tgm5/epb42 gene cluster, respectively. This places a total of six transglutaminase genes on distal mouse chromosome 2 and raises questions about their evolutionary and present-day relationship.
Evolution of Transglutaminase Genes-To analyze the evolutionary 1 relationship between the transglutaminase genes in more detail, we calculated the amino acid similarity essentially based on the sequence alignment shown in Fig. 7 and calculated evolutionary distances using different algorithms. All algorithms predicted a close relationship between factor XIII a-subunit and TG K (lineage 1), and TG X , TG Z , TG E , TG Y , band 4.2 protein, and TG C (lineage 2), respectively (Fig. 8). The genomic organization and gene structure also support the close relationship of the latter six genes and corroborate the placement of these in a separate phylogenetic branch. The exact relationship of TG P to these two lineages is less certain, but it is likely to have branched off from lineage 2 approximately at the same time as factor XIII a-subunit and TG K diverged. Only a single transglutaminase gene has been identified in genomes of invertebrate species, and it is thus likely that the separation of transglutaminases into four branches is likely to have occurred after divergence of invertebrates from proto-vertebrates. In fact, a similar relationship between invertebrates and vertebrates has been found for many orthologous genes, and it has been proposed recently that octaploidy in early vertebrates may have resulted in two successive genome duplications in early vertebrates (52,53). Our phylogenetic analysis is consistent with this model. One branch of lineage 2 has subsequently undergone multiple duplications locally to generate the six genes that are clustered on mouse chromosome 2. Despite the close relationship of neighboring genes within the clusters, one possible scenario is that a single transglutaminase gene initially locally duplicated and was followed by a duplication of a larger segment of the chromosomal region, giving rise to the organization of the genes seen in mouse. In humans these chromosomal regions were apparently redistributed to two different chromosomes. Our analysis further indicates that orthologues of these six genes are likely to exist in all higher vertebrate species.
Functional Redundancy in Transglutaminase Gene Family-Despite differential expression in tissues and apparently distinct promoter organization (5,51), the expression patterns of TG C , TG Z , and TG X and to a more limited degree TG E overlap. We have preliminary data showing that in TG C null mice (11), the expression of several other transglutaminases (including TG X and TG E ) is up-regulated. 5 Based on the apparent lack of a phenotype in response to TG C gene ablation in mice, the close similarity of these gene products, and their overlapping expression, we hypothesize that these gene products have overlapping functions.
In conclusion, the identification of several new transglutaminases that are closely related to TG C could explain some of the contradictory data and large number of proposed functions in the literature regarding this enzyme and should stimulate the re-evaluation of these experiments in the light of these findings.