Cercarial elastase is encoded by a functionally conserved gene family across multiple species of schistosomes.

Water borne cercaria(ae) of the trematode genus Schistosoma rapidly penetrate host skin. A single serine protease activity, cercarial elastase, is deposited in advance of the invading parasite by holocytosis of vesicles from ten large acetabular gland cells. Cercarial elastase activity is a composite of multiple isoforms. Genes coding for the isoforms can be divided into two classes by amino acid and promoter sequence homology. Two of the five genes identified in Schistosoma mansoni account for over 90% of the activity and protein released. The remaining genes produce little protein or are silent. Positional scanning synthetic combinatorial substrate libraries demonstrate that the two major isoforms have similar substrate specificities and are, therefore, isoenzymes. The closely related Schistosoma hematobium and the distantly related Schistosomatium douthitti also contain multiple orthologous cercarial elastase genes suggesting that gene duplication may have occurred after speciation in Schistosoma evolution and that this duplication has been conserved.

Cercaria(e), the aquatic infective larval stage of schistosomes, are highly adapted to rapidly penetrate the skin of the host upon contact. Enzymatic hydrolysis of host proteins is required for successful entry into the host vascular system (1). Two gland systems, the preacetabular and postacetabular glands, release proteases and comprise the majority of the volume of the cercarial head. Each gland cell releases proteases at the leading edge of the invading parasite through long, microtubule-lined cell processes or "ducts" that exit at the anterior head (2). The postacetabular glands are also responsible for depositing mucin, providing an adhesive surface on the skin for the parasite to initially attach. Considering the diverse set of macromolecular barriers the cercariae must breach during invasion, we previously investigated the possibility that multiple enzyme activities were required. However, only a single protease activity, cercarial elastase, was found to be present in acetabular gland secretions and required for invasion (3).
Cercarial elastase is a trypsin family serine protease named because of its ability to cleave insoluble elastin, a major component of the dermis of skin (4,5). Its P1 substrate specificity (1) is for large hydrophobic side chains, but in contrast to chymotrypsin (3) cercarial elastase is more active against macromolecular substrates than synthetic tetrapeptides.
We examined the complement of genes coding for cercarial elastase in Schistosoma mansoni and found a family of isoforms that can be divided into two classes by amino acid and promoter sequence homology. This family of genes is also conserved in another schistosome species Schistosoma hematobium and Schistosomatium douthitti. The two most highly expressed S. mansoni isoforms comprise Ͼ90% of the released activity and are virtually identical in biochemical properties.

EXPERIMENTAL PROCEDURES
Collection of Cercarial Secretions-Approximately 3-5 ϫ 10 5 S. mansoni cercariae (Puerto Rican strain) were collected in 750 ml of distilled water from 200 -300 infected Biomphalaria glabrata snails using a light induction method previously reported (6). Secretions were collected by placing cercariae in Petri dishes coated with linoleic acid (to simulate skin contact) and floated in a 37°C water bath to produce a thermal gradient. After 2 h the conditioned water was filtered using Swinnex-47 Grade 541 filters (Millipore, Bedford, MA) to remove cercarial bodies and debris. The secretions were then lyophilized and stored at 4°C. Each batch of collected material constitutes one "snail shed".
Protease Chromatography-Cercarial secretions from four snail sheds were pooled and resuspended in 6 ml of gel filtration buffer (200 mM sodium-acetate, pH 6.5) and centrifuged for 30 min at 10,000 rpm in an SA-34 rotor (4°C). The supernatant was loaded onto a SR 16/100 column (Amersham Biosciences) packed with Sephacryl 200 (gel filtration resin; Amersham Biosciences). The column was run at a rate of 25 ml/hr, and 4-ml fractions were collected overnight at 4°C. Fractions were assayed using 10 l of sample and 100 l of assay buffer (100 mM glycine, pH 9.0, 100 m succinyl-ala-ala-pro-phe p-nitroanilide (AAPF-pNA)). 1 Absorbance kinetics were determined using a UV-Max spectrophotometer and SoftMax Version 2.02 software (Molecular Devices, Sunnyvale, CA).
Fractions with high AAPF-pNA activity were pooled and bufferexchanged into running buffer (20 mM MES, pH 6.8) using PD-10 buffer exchange columns (Amersham Biosciences). The sample was then loaded onto an HR 5/5 Mono-Q anion exchange column at 1 ml/min. Elution of protein from the column was accomplished with a combination of discontinuous steps or gradients of elution buffer (20 mM MES, pH 6.8, 1.0 M NaCl). There were three elution steps separated by linear gradients (fractions 0 -20/0 -125 mM NaCl, fractions 21-25/225-300 mM NaCl, fractions 26 -30/300 -1000 mM NaCl). Fractions were assayed for protease activity as described above. * This work was supported by a Merit Grant from the Department of Veterans Affairs and the Sandler Family Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  Peaks of activity from the ion exchange column were analyzed by SDS-PAGE. NuPage-polyacrylamide 4 -12% bis-Tris gradient gels (Invitrogen) were used according to the manufacture's instructions. Gels were stained with Coomassie Blue to visualize protein and determine purity.
Recombinant SmCE-1a and Antibody Production-An Escherichia coli expression construct, pET-21a-SmCE-1a-I28M-6xH, was assembled by inserting the active portion of SmCE-1a, amplified by PCR, into the restriction sites NdeI (blunt) and XhoI of the vector pET21-a (Invitrogen). The forward primer 5Ј-TGCGTAGTGGTGAACCTGTGCA-3Ј created a blunt end to match with the blunt NdeI site of the vector changing the first amino acid to a start codon (Ile-28 3 Met). The reverse primer 5Ј-CAGATCCTCGAGAATATTGGAGCGTACAAAATC-C-3Ј added an XhoI site (underlined) and removed the stop codon creating an open reading frame into the pET21-a vector that added a 6ϫ histidine tag to the C terminus of the protein. Purified recombinant His-tagged protein from E. coli was then used to produce rabbit antisera using standard commercial procedures (Corvance, Richmond, CA).
Immunoblot Analysis-Proteins were separated by SDS-PAGE as described above and transferred onto polyvinylidene difluoride membranes using the Novex transfer system (Invitrogen). Polyclonal rabbit sera was used to probe the blot. Anti-rabbit horseradish peroxidaseconjugated secondary antibody was used for detection with standard ECL reagents (Amersham Biosciences).
Promoter Identification by PCR-based Genomic DNA Walking-Genomic libraries were constructed with the overnight digestion of 10 g of genomic DNA and 3 g of pUC19 (Stratagene, La Jolla, CA) plasmid using individual restriction enzymes (AvrI, BamHI, BclI, EcoRI, HindIII, KpnI, PstI, SacI, SalI, SpeI, XbaI, and XhoI; New England Biolabs, Beverly, MA). Matching digestions of genomic DNA and plasmid DNA (acting as a universal linker) were mixed and purified using a spin prep column (Qiagen, Valencia, CA). Purified samples were ligated overnight at 14°C to link the plasmid to the genomic fragments. First round PCR products of nested reactions were set up in volumes of 50 l using 2 l of library DNA as template and the primer sets (5Ј-GATTACGCCAAGCTTGCATGCCTG, 5Ј-TGCGATGAACGG-GAATTCAG for cercarial elastase or 5Ј-GATTACGCCAAGCTTGCAT-GCCTG, 5Ј-TGAAGCTATTCTTCGTACATGATAA for GAPDH). Thermocycling conditions were 45 s at 94°C, 1 min at 60°C, and 3 min at 72°C for 30 cycles. Second round PCRs were set up in volumes of 100 l using 0.5 l of first round PCR product as template and the primer set (5Ј-GATTACGCCAAGCTTGCATGCCTG, 5Ј-GCACAGGTTCACCAC-TACG for cercarial elastase or 5Ј-GATTACGCCAAGCTTGCATGCCTG, 5Ј-AATAACTCCGGTGTCTTAAGTCATA for GAPDH). Thermocycling conditions were as described above. Taq polymerase (Roche Molecular Biochemicals) was used in all PCR reactions.
Second round PCR products were resolved in 1% agarose gels. Individual DNA bands (1-3 kb) were excised, purified with gel extraction columns (Qiagen), and cloned using the TOPO TA cloning vector (Invitrogen). Plasmids containing inserts of correct size were sequenced.
Isoform Identification by RT-PCR-A primer was designed to bridge the deletion region within the group 2 promoters (see Fig. 3A). This primer was specific for group 2 transcripts and was used in an RT-PCR to amplify the downstream expressed genes.
Poly(A) mRNA was isolated by poly(T) affinity chromatography (Amersham Biosciences) from the hepatopancreases of five infected snails. Cercariae develop in this region of the host snail, and transcripts for cercarial elastase are only present during this development stage. RNA was converted to cDNA using avian myeloblastosis virus reverse transcriptase (Invitrogen) as described by the manufacturer. PCR was performed using 5 l of first strand cDNA as template and the primer set (5Ј-TTGCAACATTCACATAGACAT and dT 15 ). Thermocycling conditions were 30 s at 94°C, 30 s at 55°C, and 1 min at 72°C for 30 cycles using Taq polymerase. PCR products were resolved in 1% agarose gels. Individual DNA bands were excised, purified with gel extraction columns (Qiagen), and cloned into the TOPO TA cloning vector (Invitrogen). Plasmids containing inserts of correct size were bidirectionally sequenced at the Biomolecular Resource Center of the University of California, San Francisco.
N-terminal Peptide Sequencing-Proteins were blotted onto polyvinylidene difluoride membranes and visualized by Coomassie Blue staining. Peptide sequencing was carried out using an ABI Procise 491 protein sequencer (Applied Biosystems, Foster City, CA) at the Biomolecular Resource Center of the University of California, San Francisco.
Isoform Identification by Degenerate PCR-Degenerate PCR primers targeted to conserved regions of serine proteases were used to identify cercarial elastase homologues as previously described (7). Genomic DNA or cDNA was used as template material. PCR was performed using 0.5 g of template DNA and the primer set (5Ј-CCITTICTITAIC-GIRAIRA, 5Ј-CCICTR(Y/R)ICCICCIGGIRA). Thermocycling conditions were 30 s at 94°C, 1 min at 45°C, and 1 min at 72°C for 35 cycles using Taq polymerase. Individual DNA bands were excised, purified, and cloned. Plasmids containing inserts of the predicted size were sequenced.
Isoform Identification by Hybridization of a Phage Library-A phage library was made according to the manufacturer's instructions (Stratagene) using mRNA from snail hepatopancreas infected with S. hematobium sporocysts. A radioactive probe from the cDNA of SmCE-1a (Sm, S. mansoni; CE, cercarial elastase; 1, first gene identified; a, first isoform of gene) was used to screen the library. Positive plaques were serially diluted and probed again to ensure that clones were homogenous. Colonies were then recovered and sequenced.
Positional Scanning-Synthetic Combinatorial Library Screening-Production of diverse chemical libraries has been previously described in detail (8). Position P1 was scanned using the P1 diverse peptide substrate library. Positions P4-P2 were scanned using the P1-Leu library. Native cercarial elastase (1 g) was used per well with a final sublibrary substrate concentration of 0.1 M for the P1-Leu library and 0.04 M for the P1-diverse library. Reactions were performed at 25°C in reaction assay buffer (50 mM Tris at pH 8.0, 100 mM NaCl, and 0.01% Tween 20) for 1 h. Reactions were started with the addition of enzyme and monitored with a PerkinElmer LS50B luminescence spectrometer. Excitation was at 380 nm, and emission was at 450 or 460 nm. The P1 diverse library was assayed with an equal mixture of SmCE-1a and SmCE-1b due to limited enzyme availability. The P1 position data were confirmed with individual substrates for both enzymes (data not shown).

Conformation of Positional Scanning-Synthetic Combinatorial Library Screening Results by Assay of Specific Tetrapeptide Substrates-
Four fluorogenic (ACC) tetrapeptide substrates (SWPL, TWPL, RWPL, RRPL) were synthesized as previously described (8). Substrates were assayed using 5 l of an equal mixture of SmCE-1a and SmCE-1b (0.5 g), 100 l of assay buffer (50 mM Tris, pH 8.0, 100 mM NaCl, 0.1% Tween 20), and substrate at 25 M. This substrate concentration was determined to be significantly lower than the K m value allowing for direct comparisons between the four substrates kinetics. Absorbance kinetics was determined using a UV-Max spectrophotometer and Soft-Max Version 2.02 software (Molecular Devices).
Isoform and Species Sequence Relationship-All available amino acid sequence information was aligned using the GCG pileup program (Oxford Molecular Group, Cambridge, UK) and visually inspected for misalignments. A consensus sequence generated from this alignment was used to determine the nearest neighbors with the BLAST program (9). These sequences were then used in combination with the schistosome sequences to produce a phylogenetic tree using the protein parsimony method of the PHYLIP software package (10).

Purification of Cercarial Elastase Reveals Multiple
Isoforms-Cercarial elastase purification requires two steps. Gel filtration of cercarial secretions produces a single broad peak of elastase activity and was detected with the tetrapeptidyl substrate AAPF-pNA (Fig. 1A). Anion exchange chromatography was then used to further purify cercarial elastase activity. A shallow salt gradient revealed three peaks of activity (Fig. 1B). Each peak was associated with a 25-kDa protein product (Fig.  1C). Polyclonal antiserum, generated from pure, recombinant, non-glycosylated cercarial elastase of bacterial origin (2), was used in an immunoblot analysis of these three peaks. The high level of reactivity for each of the bands suggested strong sequence similarity (Fig. 1D). The lower 16-kDa band observed in each lane was a degradation product resulting from autoproteolysis (5).
Identification of Isoform Promoters Reveals Two Classes of Cercarial Elastase-A PCR strategy was adopted to clone regions of genomic DNA upstream of cercarial elastase genes (Fig. 2). Multiple libraries of genomic DNA, digested with individual restriction enzymes and ligated with universal linkers, were constructed. Two assumptions were made about the protease gene family. First, a reverse primer homologous to the N-terminal portion of the gene family would bind all elastase isoforms. Second, restriction sites in the non-coding upstream region would be heterogeneous. PCR amplification using a forward primer homologous to the universal linker and the cercarial elastase reverse primer yielded multiple products that varied with each library (Fig. 2A). Cloning and sequencing of the individual PCR products confirmed that the bands were upstream genomic DNA from multiple related elastase genes. As a control, reverse primers to the GAPDH gene were also used in a separate experiment. GAPDH has been previously shown to be a single copy gene (11,12), and PCR reactions yielded single products for each library, consistent with this finding (Fig. 2B).
Two genomic clones of cercarial elastase, designated EL1 and EL2, have been reported by Pierrot et al. (13). EL1 was described as the genomic clone of the previously published cercarial elastase cDNA sequence (SmCE-1a; Table I). EL2 was reported to contain a D125A mutation in the catalytic triad, yielding an inactive protease and also having a predicted transcript that was undetectable. An alignment of the predicted cDNA from EL1 and the first published cDNA (SmCE-1a) revealed only 25 single base pair substitutions (of 795) suggesting that these genes may be isoforms. A unique SpeI restriction site is present in SmCE-1a but absent in SmCE-1b. Digestion of RT-PCR products with SpeI demonstrated that both transcripts are produced and confirmed that the genes are isoforms and not strain variants (data not shown). Exhaustive attempts by RT-PCR with parasite cDNA to identify a transcript of the predicted catalytically dead gene, SmCE-1c (EL2), were similarly unsuccessful. Previously submitted DNA sequences to public data bases and their nomenclature relative to this paper are summarized (Table I).
Alignment of the upstream non-coding DNA sequences revealed a high degree of conservation centering on the TATA box. This region extends 36 bp upstream, 55 bp downstream, and is 80% identical. One of the isoform promoters contained a 10-bp deletion just upstream of the translational initiation codon (Fig. 3A). Additionally, three amino acids in the open reading frame of this fragment did not match the known elastase sequences indicating another isoform. A forward primer bridging the deletion region allowed for the full-length cloning of the mRNA downstream of this promoter by RT-PCR. Two PCR products were generated and sequenced. Both products were genes coding for cercarial elastase, but each was significantly divergent from previously known amino acid sequences. These results bring the total number of cercarial elastase genes in S. mansoni to five (Fig. 4).
Not All Isoforms Are Transcribed and/or Translated-N-terminal sequencing of the purified enzymes have confirmed the translation of three of five cercarial elastase genes (Fig. 3B). Isoforms SmCE-1a and SmCE-1b are the most abundant species present in cercarial extracts comprising over 90% of the protein and activity. Isoform SmCE-2a constitutes a minor component of cercarial secretions. Isoform SmCE-2b has detectable levels of mRNA transcript, but attempts to purify active protein of this isoform were unsuccessful and indicate that its abundance is lower than our methods are able to detect. The last predicted isoform, SmCE-1c (EL2), could not be detected by RT-PCR using specific primers.
Positional Scanning, Combinatorial Substrate Libraries Demonstrate That the Two Major Isoforms Are Isoenzymes-Substrate preferences between the two major species of cercarial elastase, SmCE-1a and SmCE-1b, were compared using two combinatorial peptide substrate libraries that together scanned substrate positions P4, P3, P2, and P1. Both enzymes prefer the same sets of amino acids for each of the positions scanned indicating that the isoforms are enzymatic equivalents (Fig. 5).
Position P4 has a marked preference for serine or threonine but will tolerate 18 other amino acids. Position P3 showed little preference. The P2 position is the most selective position of the enzyme preferring only proline. The selectivity for P2 in FIG. 1. S. mansoni cercarial elastase chromatography analyzed by Coomassie, Western blot, and peptide sequencing. A, gel filtration of secretions from ϳ2 ϫ 10 6 cercariae. Fractions were measured for AAPF-pNA activity and protein concentration. Void volume contains unfractionated high molecular weight proteins. Elastase activity indicates fractions capable of degrading insoluble elastin assayed as described previously (8). B, anion exchange of fractions with high AAPF-pNA activity from A. Samples were pooled and desalted before loading. A step gradient of salt was used to elute the activity. Fractions were assayed with AAPF-pNA. FT indicates flow through of loaded sample. C, SDS-PAGE followed by Coomassie Blue staining of activity peaks from B. D, immunoblot using antibodies specific to recombinant cercarial elastase. Activity peaks from B are displayed. E, N-terminal peptide sequencing data for each fraction peak. Undefined amino acid are indicated by an X. Shaded amino acids indicate positions that positively identify each activity peak with their corresponding cDNA sequences.

FIG. 2. Genomic fragments amplified from S. mansoni libraries.
Genomic libraries, restricted with various endonucleases, were used as template DNA in nested PCR reactions. Forward primers were specific to a linker ligated to the restricted fragments, and reverse primers were specific to cercarial elastase (A) or GAPDH (B). Both gels show the second round of the nested PCR amplifications. Note: if multiple copies of a gene are present in the genome and restriction site locations are heterogeneous, then multiple bands would be expected. GAPDH, known to be a single copy gene in Schistosoma mansoni, was included as a control. SmCE-1b is slightly broader than SmCE-1a with alanine also accepted in the position. The P1 position for SmCE-1a and SmCE-1b was scanned as an enzyme mixture due to the high enzyme requirements of the P1 diverse library. Both isoforms were then tested with individual tetrapeptidyl substrates for each P1 amino acid that yielded cleavage activity and were found to be similar (data not shown).
Individual Substrates Confirm Combinatorial Substrate Library Predictions-Four tetrapeptidyl substrates representing both favorable ((S/T)W) and unfavorable (RR) amino acids at positions P4 and P3 (Fig. 5) were compared relative to each other for cleavage by cercarial elastase activity (Fig. 6). P2 and P1 were fixed with proline and leucine respectively. The predicted ranking of substrate kinetics from the chemical library data was confirmed (SWPL Ϸ TWPL Ͼ RWPL Ͼ RRPL). An 11-fold difference in substrate kinetics was observed between a favorable (serine) and unfavorable (arginine) amino acid in the P4 position. The P3 position also influenced substrate kinetics with the favorable amino acid (tryptophan) having three times more activity than the unfavorable amino acid (arginine). The best tetrapeptide substrate of the four was SWPL.
S. hematobium and S. douthitti Contain Orthologous Cercarial Elastase Genes-A variety of methods were used to identify gene sequences within other species of schistosomes. Degenerate PCR using S. hematobium cDNA template produced two similar but unique sequences designated ShCE-1a and ShCE-1b. A phage cDNA library was also screened, using a radiolabeled SmCE-1a probe, and identified a single full-length mRNA for ShCE-1a ( Fig. 4 and 7). The same PCR strategy using S. douthitti genomic DNA also produced two fragments, SdCE-1a and SdCE-1b, of which SdCE-1b contained a unique intron that  does not share a location with any other known introns present in S. mansoni cercarial elastase sequences (Fig. 7).
Cercarial Elastase Similarity across Species Follows a General Genetic Trend-The phylogenetic relationships of all nine cercarial elastase amino acid sequences reported here were calculated using the PHYLIP software package (Fig. 8). Each set of isoforms from each species clusters together. S. mansoni and S. hematobium are closely linked, whereas the S. douthitti enzymes branch from the S. hematobium cluster. The most homologous serine proteases from other organisms were tryp- . Bars indicate level of cleavage for each library member as judged by the release of ACC(7-amino-4carbamoylmethyl coumarin) from substrate relative to optimal side chain at that site and normalized from 0 to 100 (see "Experimental Procedures"). The single P1 histogram represents the combined activity of SmCE-1a and SmCE-1b. sin family proteases from Metarhizium anisopliae (AF130865) and human elastase IIIA precursor (A29934). DISCUSSION Schistosome cercariae must penetrate the formidable barrier of skin. To do so they have evolved a potent serine protease capable of degrading multiple macromolecular targets in skin (5). Previous studies have confirmed the necessity for this protease activity in cercarial invasion (1). Recent work has confirmed that it is the sole histolytic activity present in cercarial secretions (3). As such, one would expect it to be conserved across schistosome species. The studies reported here confirm that it is highly conserved between S. mansoni and S. hematobium, and it is identifiable in the more distantly related S. douthitti.
Cercarial elastase activity is encoded by a family of closely related genes. The presence of gene duplication and the resultant protease isoforms may reflect in part selection for redundancy in a key gene required for transmission from snail to human host. Comparison of the dominantly expressed cercarial elastase isoforms in S. mansoni demonstrated a high degree of similarity in their preferred substrates. This suggests that these two genes are not providing substrate cleavage diversity but may increase the total amount of protein produced in the acetabular glands through duplication.
The protease isoforms fall into two families identifiable by differences in their promoters. Expression of cercarial elastase is strictly limited to the sporocyst lifecycle stage in which cercariae develop (4,14). However, cercarial elastase in S. mansoni has been shown to be expressed in both pre-and postacetabular cells. These two cell compartments, while contributing to cercarial secretions, produce different products. The preacetabular cells have cercarial elastase co-localized in vesicles containing high levels of calcium, whereas the postacetabular glands contain protease and mucopolysaccharides (15,16). Perhaps the two promoter families represent gene products that localize to these two separate compartments.
Alternatively, the gene redundancy could be explained by a "gene in waiting hypothesis." The duplication and subsequent divergence of these genes provides a potential mechanism to alter the substrate specificity of the protease. Such a reservoir of genes would provide protection for the parasite against host adaptation or enhancement and expansion of host range. This hypothesis can explain why there are multiple isoforms yet only two highly conserved genes accounting for the majority of protein produced. This hypothesis is also testable. Additional data on schistosomes infecting a variety of hosts would be predicted to show differential levels of cercarial elastase gene transcripts and products with conservation of the genes themselves.
High sequence conservation around the TATA box makes it impossible to identify specific potential transcription factor binding sites but does suggest that the assembly of the transcription machinery would involve most of this region. A better understanding of the transcriptional regulation of schistosome will be possible as more upstream sequence information is obtained by comparisons between and within species.
We have shown cercarial elastase activity to be the sum of multiple protease isoforms. Protease gene duplications are observed in three schistosome species and provide an opportunity to document the evolution of a gene family in the genus Schistosoma. Positional scanning combinatorial substrate libraries were used to demonstrate that the proteases expressed from the gene family are isoenzymes.  (17).