Accelerated evolution in inhibitor domains of porcine elafin family members.

Through the analysis of the porcine gene encoding the elastase inhibitor elafin, we demonstrated that there are at least three closely related members of the elafin family, and their genes have arisen by accelerated evolution. A porcine genomic DNA library was screened with a previously cloned human elafin cDNA probe, and several positive clones were obtained that can be distinguished by a combination of restriction enzymes. Sequence analysis of these clones revealed the presence of three homologous members whose genes, all consisting of three exons and two introns, are almost identical except the exon 2 sequences encoding the inhibitor domain called "WAP motif"; the intron sequences are related to each other with sequence similarities of 93-98%, whereas the exon 2 sequences exhibited only 60-77% similarities among the three members. The extreme divergence in the exon 2 sequences compared to the highly conserved intron sequences may be generated by accelerated mutations confined in a short stretch of the genes following recent duplication events of a single ancestral gene. An RNase protection assay indicated that the messages of the elafin family members are abundantly expressed in the trachea and intestine, suggesting that the most likely selective forces for the accelerated evolution are extrinsic proteinases produced by invasive microorganisms.

Through the analysis of the porcine gene encoding the elastase inhibitor elafin, we demonstrated that there are at least three closely related members of the elafin family, and their genes have arisen by accelerated evolution. A porcine genomic DNA library was screened with a previously cloned human elafin cDNA probe, and several positive clones were obtained that can be distinguished by a combination of restriction enzymes. Sequence analysis of these clones revealed the presence of three homologous members whose genes, all consisting of three exons and two introns, are almost identical except the exon 2 sequences encoding the inhibitor domain called "WAP motif"; the intron sequences are related to each other with sequence similarities of 93-98%, whereas the exon 2 sequences exhibited only 60 -77% similarities among the three members. The extreme divergence in the exon 2 sequences compared to the highly conserved intron sequences may be generated by accelerated mutations confined in a short stretch of the genes following recent duplication events of a single ancestral gene. An RNase protection assay indicated that the messages of the elafin family members are abundantly expressed in the trachea and intestine, suggesting that the most likely selective forces for the accelerated evolution are extrinsic proteinases produced by invasive microorganisms.
Elafin is a unique elastase inhibitor (1) having a transglutaminase substrate domain that serves as an anchoring sequence to confine the inhibitor at its sites of action (2)(3)(4). Elafin, also known as skin-derived antileukoproteinase (SKALP; Refs. 5 and 6) or elastase-specific inhibitor (ESI, Ref. 7), is synthesized as pre-elafin and secreted into the extracellular matrix space where it is cross-linked to certain structural proteins through its transglutaminase substrate domain by the action of transglutaminase (for review on transglutaminase, see Refs. 8 -10). Although elafin was first isolated as a 57amino acid protein with a characteristic disulfide-linked structure called four-disulfide core or WAP motif (11), the 57-amino acid form is now considered to represent the inhibitor domain generated, by proteolytic cleavage, from the 95-amino acid na-tive elafin which consists of the following two domains: 1) transglutaminase substrate domain that we termed cementoin moiety (3) and 2) inhibitor domain or WAP motif. Biochemical and molecular biological studies on elafin have so far been performed using human materials; however, for more detailed studies such as identification of the acceptor proteins to which elafin is anchored by covalent cross-linking through the transglutaminase substrate domain, we considered it more appropriate to use other mammalian species from which fresh materials are relatively easily obtainable and initiated cloning and characterization of porcine elafin gene, which resulted in the following unexpected findings that 1) there are at least three elafin-related genes that are very similar to each other over the entire length of about 4 kb 1 including the intron sequences, suggesting that they have arisen by relatively recent gene duplications and 2) in contrast to the extremely high similarity (97-98%) in the intron and parts of the exon sequences corresponding to the noncoding regions, the gene sequences coding for the inhibitor domains or the WAP motif regions are surprisingly variable (77-81%) among the newly found elafin family members. This type of dramatic degree of non-homology within a short stretch of gene sequences that specifically affects the functionally important regions of protein sequences is called accelerated evolution. Accelerated evolution, or an unusually high rate of mutation in a certain segment of the genes, is a process postulated to occur in genes following a duplication event (12,13) and is considered to be an effective mechanism to provide the hosts with a defense system against unwanted outsiders such as pathogens and parasites and, inversely, to provide the intruders with an increased capacity of invasion. Currently, however, only a few cases of accelerated evolution have been reported. 1) The first case is a family of the serine proteinase inhibitors (serpins) in which an unusually high degree of polymorphism was found in a narrow region surrounding the reactive site. This original observation by Hill et al. (14,15) was later extended to other members of the serpin family by Barriello and Krauter (16) and the divergence was confirmed at the cDNA level, but their genes, especially the introns, have not been analyzed. Extrinsic proteinases used by parasites to facilitate their spread throughout the host are considered as the most likely selective force. 2) An extremely high rate of nonsynonymous nucleotide substitutions that cause amino acid changes has been found in the active domain-coding region of the wheat thionin genes by cDNA cloning (17). Thionins are a family of cysteine-rich proteins and active against plant pathogens. 3) Nakashima et al. (18,19) have found that the introns and noncoding regions of the Trimeresurus flavoviridis (Habu snake) venom gland phospholipase A 2 isozyme genes are unusually conserved as compared to the protein-coding regions except for the presequence-coding regions, indicating that the genes have evolved so as to bring about accelerated amino acid substitutions in the mature protein-coding regions. An additional example of the accelerated evolution reported here for the elafin family genes will be helpful for both theoretical and experimental analyses of the unique events in the evolution of the genes. It should also be emphasized that the genes analyzed in the present study are the first mammalian family of genes that have unique introns whose sequences are exceptionally highly conserved (93-98% sequence identities) compared to their relatively divergent exons. The genes may therefore be one of the most recently duplicated gene families.

EXPERIMENTAL PROCEDURES
Materials-Fresh porcine tissues were obtained from the Shibaura abattoir sanitation inspection station, Tokyo, Japan; restriction enzymes, random primer DNA labeling kit version 2, and DNA ligation kit were from Takara, Kyoto, Japan; pBluescript II SK Ϫ plasmid vector and RNA transcription kit were from Stratagene; porcine genomic library constructed in EMBL3 SP6/T7 using Sau3A partial digest was from Clontech; plasmid vector pBluescript II was from Stratagene; doublestranded nested deletion kit and AutoRead sequencing kit were from Pharmacia, Uppsala, Sweden; 17-mer oligonucleotide primers labeled with fluorescein isothiocyanate were from Biologica Co., Nagoya, Japan; 32 P-labeled nucleotides were from DuPont NEN; RPA II ribonuclease protection assay kit was from Ambion Inc., Austin, TX; RNasefree DNase I was from Boehringer Mannheim; RNase inhibitor RNasin was from Toyobo, Osaka, Japan; X-Omat AR x-ray film was from Kodak, Rochester, NY; nitrocellulose filters (BA 85, 0.45 m) were from Schleicher and Schuell, Dassel, Germany; human elafin and N-methoxysuccinyl-alanyl-alanyl-prolyl-valine 4-methylcoumaryl-7-amide (Suc(OMe)-Ala-Ala-Pro-Val-MCA) were from Peptide Institute Inc., Osaka, Japan; porcine pancreatic elastase was from Sigma. The WAP motif regions or inhibitor domains of the porcine WAP family members WAP-1, WAP-2, and WAP-3 were chemically synthesized according to the published method (20,21) and named pWap-1, pWap-2, and pWap-3, respectively. Other chemicals were of reagent grade.
DNA Sequencing-For DNA sequencing, 3-3.2-kb HindIII and 628-bp EcoRI/HindIII fragments of the porcine WAP family genes were subcloned into the pBluescript II SK Ϫ plasmid vector (Stratagene). Overlapping deletion clones were obtained by exonuclease III digestion using a Pharmacia double-stranded nested deletion kit. Doublestranded DNA sequencing was performed by the dideoxy chain termination method (23) using a Pharmacia AutoRead sequencing kit with either T3, T7, or synthetic 17-mer primers labeled with fluorescein isothiocyanate.
Data Analysis of DNA and Amino Acid Sequences-DNA sequence data were organized and analyzed using the program GENETYX-MAC (Software Development Co., Ltd., Tokyo, Japan). Sequence comparison was done using the FASTA (24) and BLAST (25) electronic mail server on the GenomeNet and WAIS server in the DNA data bank of Japan. For the comparisons of nucleotide sequences among the human elafin gene and the genes of the porcine WAP family members, the numbers of nucleotide substitutions were estimated by the method of Jin and Nei (26); the numbers of nonsynonymous substitutions which cause amino acid changes and synonymous substitutions which cause no amino acid changes were computed according to the method of Nei and Gojobori (27) using the molecular evolution analysis system ODEN in the DNA data bank of Japan.
RNase Protection Analysis-Total RNA was prepared from porcine tissues by the acid guanidinium thiocyanate-phenol-chloroform method (28). Antisense RNA probes were prepared from the subclones carrying the 249-bp BglII/BamHI fragment of pgWAP-1, the 417-bp BglII fragment of pgWAP-2, or the 359-bp BglII fragment of pgWAP-3 in the BamHI site of pBluescript II SK Ϫ , which contains the T3 promoter for transcription of antisense RNA. These three fragments were designed to contain parts of exon 2 and intron 2. Riboprobes were transcribed from SalI-linearized templates using an RNA transcription kit in the presence of [␣-32 P]UTP and RNasin, an RNase inhibitor. The template DNA was digested with RNase-free DNase I, and the labeled probe was purified by gel filtration using a Pharmacia NICK column. The following procedures were carried out using an RPA II ribonuclease protection assay kit. The radiolabeled probe (1-2 ϫ10 5 cpm) was hybridized to 5 g of total RNA from various porcine tissues, or to 5 g of yeast tRNA as a negative control, and incubated overnight at 43°C. Singlestranded RNA was removed by digestion with DNase-free RNase A and RNase T 1 , and the protected fragments were separated on a 5% denaturing polyacrylamide gel containing 7 M urea and quantitated by fluorography.
Measurement of Elastase-inhibitory Activities of WAP Family Members-The inhibitory activities of human elafin and the structurally related porcine proteins WAP-1, WAP-2, and WAP-3 were assayed according to a modification of Wiedow et al. (1) by the use of the fluorogenic substrate Suc(OMe)-Ala-Ala-Pro-Val-MCA. Samples (100 l) were preincubated with porcine pancreatic elastase (1 g in 100 l) in assay buffer (0.1 M Hepes, 0.5 M NaCl, 10% Me 2 SO, 0.01% lysozyme, pH 7.5) for 30 min at room temperature, and the remaining elastase activities were measured by adding 1.8 ml of assay buffer containing 0.1 mM substrate. The fluorescence intensities were monitored at an excitation wavelength of 383 nm and an emission wavelength of 455 nm using a Hitachi 850 fluorescence spectrophotometer.

Isolation and Sequences of the Genes Encoding Porcine Elafin and Its Family
Members-Screening of a porcine genomic DNA library with a 3-kb XbaI/BamHI fragment of human elafin gene (22) led to the isolation of 13 positive clones. Restriction mapping with EcoRI, BamHI, BglII, HindIII, SacI, and SpeI revealed that they could be divided into three groups with very similar but slightly different restriction enzyme sites . Restriction enzyme sites are also shown to illustrate a high degree of similarity of the three porcine genes even in the intron sequences. Gaps are introduced in regions a and b to produce optimal alignment; the gaps in region a are due to the difference in the number of repeats, and the gap in region b of the human elafin gene is due to the presence of the SINE sequence (PRE-1, small striped boxes) (29,31) in the three porcine genes. (Fig. 1); the three independent clones were designated pg-WAP-1, pgWAP-2, and pgWAP-3. Southern blot and sequencing analyses of the restriction enzyme fragments of the three clones revealed that the common ϳ3.7-kb EcoRI/HindIII fragment contained the entire gene sequence that is divided into three exons by two introns (Figs. 1 and 2). Fig. 2 shows the nucleotide sequence of porcine elafin gene (pgWAP-1) and those of the 5Ј-and 3Ј-flanking regions. The nucleotide sequences of the other members pgWAP-2 and pgWAP-3 have been deposited in the DDBJ data bank with accession numbers D50320 and D50321, respectively. Interestingly, short interspersed repetitive elements (SINEs) were found in intron 2 of the three genes. Fig. 3 shows the amino acid sequences of the three members of the porcine elafin family deduced from the open reading frames of their genes by reference to the exon-intron organization of the human elafin gene (22). All three members contained, in common, a presequence of about 25 amino acids, a transglutaminase substrate domain composed of 10 -17 semiconserved repeats of 6 amino acids rich in Lys and Gln, and an inhibitor domain where the 8 Cys residues responsible for forming the four-disulfide core structure or WAP motif (11) are conserved.
Identification of WAP-1 as Porcine Elafin-Among the three members identified above, WAP-1 exhibited the highest similarity to human elafin in the amino acid sequence (Fig. 3), suggesting that WAP-1 is the porcine homolog of human elafin. To confirm this, we synthesized chemically the three polypeptides (pWap-1, pWap-2, and pWap-3) and examined their abilities to inhibit elastase. As expected, pWap-1 inhibited porcine pancreatic elastase with an IC 50 of approximately 2 g/ml (Fig.  4); however, pWap-2 and pWap-3 exerted no inhibitory effects on the elastase, suggesting that they display a relatively narrow and quite distinct spectrum of anti-proteinase activities.
Mutational Burst in a Short Segment of the Genes Encoding Inhibitor Domains of Elafin Family Members-Comparison of the nucleotide sequences of the elafin family genes revealed a surprisingly high similarity with each other (Table I, Fig. 5). The highly conserved regions include the two introns, exon 1 encoding the presequence, and exon 3. Certain regions of exon 2, however, share only 60 -77% nucleotide identity; the highest divergence is in the active center coding region. This degree of divergence is in striking contrast to the high conservation of the other regions that display at least 97-98% identity. It should be noted, however, that within the variable region, the  (Table II). The ratio K A /K S measures the rate of amino acid replacement in a selected region relative to the neutral rate; a ratio less than 1 indicates that the amino acid replacement rate is less than the neutral rate, and, generally, this is the case because of func-tional constraints placed on the proteins. In fact, in most regions of the elafin family genes, the K A values are lower than the K S values (Table II). In the regions coding for the hypervariable regions 1 and 2 (Fig. 5), the K A values exceed significantly the K S values, suggesting that the hypervariable regions have undergone accelerated evolution.
The comparisons, especially the intraspecies and interspecies comparisons of the intron sequences, strongly suggest that the porcine elafin gene duplication and diversification occurred after the porcine had diverged from the human lineage. This conclusion is supported by the presence in the porcine genes but the absence in the human elafin gene of SINE (see below and "Discussion") ( Fig. 1).
Size Variations of Repetitive Sequence in Exon 2-The 5Ј half of exon 2 of each elafin family gene encodes the transglutaminase substrate domain or cementoin-like moiety (3, 4) that consists of several semiconserved repeats of 6 amino acids. The corresponding gene sequences are also composed of semiconserved tandem repeats of 18 nucleotides. Although the repetitive sequences are highly conserved at both the nucleotide and amino acid levels among the family members, the numbers of repeats are variable (Figs. 3, 5, and 6). Such size polymorphism presumably arose from "slippage" of the DNA polymerases or gene conversions or both as often occurs in the regions containing repeated sequences. Fig. 6 compares the repetitive sequences of pgWAP-1, -2, and -3 by highlighting the nucleotide positions that are different from the consensus sequence AAAGGTCAAGATCCAGTC, which revealed an interesting pattern of similarity among the family members; for example, the repetitive sequences can be subdivided into 3 blocks, each of which consists of several 18-nucleotide repeating units (blocks A, B, and C in Fig. 6). pgWAP-1 and pgWAP-2 have block A in common; pgWAP-1 and pgWAP-3, a part of block A; and pgWAP-2 and pgWAP-3, block B and a part of block C. To obtain some insight into how these blocks of repeating sequences were generated, we constructed a phylogenetic tree (dendrogram) of the three porcine WAP family genes based on their nucleotide sequence similarities and tried to locate, on the dendrogram, the points of insertion or deletion of the blocks (Fig. 6A). The following series of events may have occurred. 1) Block A was inserted before the first duplication of the ancestor gene; 2) block B was inserted after the first but before the second duplication that produced pg-WAP-2 and pgWAP-3; 3) block C was inserted following the second duplication; and 4) block D was deleted after the second duplication from the pgWAP-3 trait. The insertion of block C is considered to be a relatively recent event since the repeating unit in the block is most highly conserved.
SINE in Intron 2-An approximately 250-bp porcine SINE, also known as PRE-1 (29 -33), was found in the second intron of each elafin family gene (Figs. 1 and 2). The three SINE sequences, inserted in the opposite direction, are highly homologous and flanked by a 9-bp direct repeat, GAGTGTTTG. They have an internal RNA polymerase III promoter with a typical A and B box consensus, a property common to SINEs (34).
Tissue-specific Expression-To define the localization of the elafin family members, we performed an RNase protection assay using 32 P-labeled antisense RNA probes representing the variable regions of exon 2 and total RNA preparations from the following porcine tissues: cerebrum, cerebellum, trachea, lung, atrium, stomach, duodenum, small intestine, and large intestine. After hybridization of the RNA preparations with the antisense probes, the samples were digested with RNases A and T 1 , and the protected fragments were analyzed by gel electrophoresis. The elafin family members exhibited quite distinct tissue distributions (Fig. 7); for example, the pWAP-1  Fig. 3) of the porcine WAP-1, WAP-2, and WAP-3 proteins were chemically synthesized and assayed for their inhibitory activities against elastase using a fluorogenic substrate. Assay mixtures contained 1 g of elastase and 10 g of the synthetic peptides or 0.3 g of human (h) elafin in 2 ml.
(elafin) message is abundantly expressed in the trachea and large intestine, whereas that of pWAP-2 appears to be confined in the small intestine. pWAP-3 was expressed at relatively low levels in the large intestine. DISCUSSION In the present paper, we reported that the porcine elafin family genes show an exceptionally high rate of nonsynonymous nucleotide substitutions in a narrow region encoding the  mature protein sequence; the overall sequence similarity outside this narrow region is very high including the intron sequences. A similar case has been reported for the genes of snake venom phospholipase A 2 isozymes (18,19). Among the reports describing accelerated evolution, the above two reports are unique in demonstrating an unexpectedly high conservation of introns since the evolution rates of introns are generally considered to be much greater than those of exons (35); the other studies concerning 1) ␣ 1 -proteinase inhibitor or ␣ 1 -antitrypsin, a major plasma proteinase inhibitor (15,16,36) and 2) plant thionins (17) have demonstrated greater substitution rates in the reactive center region based on the protein or cDNA sequences. For example, in the case of the plasma proteinase inhibitor ␣ 1 -antitrypsin, certain species such as mouse (16), guinea pig (37,38), and rabbit (36), unlike humans who have a single gene (39), have been suggested, by cDNA cloning, to have multiple genes that are very similar in the overall exon sequences (97-98% identity) but show a strikingly high level of sequence divergence at a short segment coding for a 9-amino acid stretch encompassing the reactive center. Such an unusual

/K S and K A /K S values between pairs of elafin family genes
Abbreviations are as in Table I. Region pgWAP-1 vs. pgWAP-2 pgWAP-1 vs. pgWAP-3 pgWAP-2 vs. pgWAP-3   6. Phylogenetic tree of the porcine WAP family members and generation of divergence in the repetitive sequences. A, phylogenetic tree constructed based on the amino acid and nucleotide sequences of the family members and their genes. B, alignment of repetitive sequences of pgWAP-1, -2, and -3. Only the nucleotides different from the consensus sequence are shown. The nucleotides boxed in black represent those different from the consensus but conserved among the family members, which helped to define a patchwork pattern of resemblance (blocks A, B, and C) in the repetitive sequence of the three members.
kind of divergence has been interpreted as a result of positive Darwinian evolution, thereby creating new reactive site sequences with varying specificities to cope with an increasing number of attacking proteinases. The localization of the porcine elafin family members in the trachea and intestine (Fig. 7) is consistent with this view since such regions are exposed to a variety of infectious agents such as bacteria and parasites. Another explanation for the divergence in the narrow region is the neutralist theory which suggests that whenever an exceptionally high rate of substitutions is encountered in molecular evolution, we should suspect loss of constraint that allows previously harmful mutants to become selectively neutral (13). The variable region is indeed located in a surface loop that is not crucial for maintaining the tertiary structure of the protein, so that mutations in this region are not selected against, as opposed to other parts of the protein (40,41). In the case of the neutralist concept, however, the question remains as to why the intron sequences of the snake venom phospholipase A 2 isozyme (18,19) and porcine elafin gene families are conserved much more highly than those of the exons coding for the hypervariable regions. The first identification, reported here, in the mammalian genomes of a family of genes that are almost identical including the intron sequences except for a short stretch of exons where mutational burst is observed should contribute to a better insight into the molecular mechanism of evolutionary manifold. Furthermore, the presence of short interspersed repetitive elements (SINEs) in the conserved introns may make the genes suitable for such analyses of diversification of duplicated genes as discussed below.
Eukaryotic genomes contain repeated sequence elements dispersed to thousands of locations. SINEs are the most abundant of such elements and constitute 1-5% of the total mass of DNA. Each species has its own SINEs; for example, the human SINE, Alu family, and the porcine SINE, PRE-1, are quite different. This fact and their common properties such as the presence of an internal RNA polymerase III promoter, an Arich tract at the 3Ј side, and a direct terminal repeat at the 5Ј and 3Ј ends strongly suggest that SINEs are retroposons dispersed through an RNA intermediate after speciation (34). SINEs are therefore useful markers for estimating relative dates of gene duplications and for constructing gene phylogenies (42). In the present study, SINEs were detected in the second introns of the porcine elafin gene family. In the human elafin gene, such a sequence is not present (Fig. 1) (22). This fact suggests that duplications of the porcine elafin gene and evolution, by the following diversification, of the three genes cloned here occurred after the speciation of porcines and humans. FIG. 7. Tissue distribution of porcine elafin family members determined by RNase protection analysis. Total RNA (5 g) from the indicated porcine tissues were annealed with 32 P-labeled cRNA probes specific for the porcine WAP-1, WAP-2, and WAP-3 messages and digested with RNases A and T 1 . The protected fragments were then analyzed by electrophoresis and fluorography. The autoradiogram represents an exposure of 5 days.