Structure and Activity of Human Pancreasin, a Novel Tryptic Serine Peptidase Expressed Primarily by the Pancreas*

In a search for genes encoding the serine peptidases prostasin and testisin, which are expressed mainly in prostate and testis, respectively, we identified a related, novel gene. Sequencing of cDNA allowed us to deduce the full amino acid sequence of the human gene product, which we term “pancreasin” because it is transcribed strongly in the pancreas. The idiosyncratic 6-exon organization of the gene is shared by a small group of tryptic proteases, including prostasin, testisin, and γ-tryptase. Like the other genes, the pancreasin gene resides on chromosome 16p. Pancreasin cDNA predicts a 290-residue, N-glycosylated, serine peptidase with a typical signal peptide, a 12-residue activation peptide cleaved by tryptic hydrolysis, and a 256-amino acid catalytic domain. Unlike prostasin and other close relatives, human pancreasin and a nearly identical chimpanzee homologue lack a carboxyl-terminal membrane anchor, although this is present in 328-residue mouse pancreasin, the cDNA of which we also cloned and sequenced. In marked contrast to prostasin, which is 43% identical in the catalytic domain, human pancreasin is transcribed strongly in pancreas (and in the pancreatic ductal adenocarcinoma line, HPAC) but weakly or not at all in kidney and prostate. Antibodies raised against pancreasin detect cytoplasmic expression in HPAC cells. Recombinant, epitope-tagged pancreasin expressed in Chinese hamster ovary cells is glycosylated and secreted as an active tryptic peptidase. Pancreasin's preferences for hydrolysis of extended peptide substrates feature a strong preference for P1 Arg and differ from those of trypsin. Pancreasin is inhibited by benzamidine and leupeptin but resists several classic inhibitors of trypsin. Thus, pancreasin is a secreted, tryptic serine protease of the pancreas with novel physical and enzymatic properties. These studies provide a rationale for exploring the natural targets and roles of this enzyme.

Serine proteases are a fertile family of hydrolases using the side-chain hydroxyl group of a precisely positioned serine to attack the carbonyl carbon of a target peptide bond (1). Despite this shared enzymatic mechanism, serine proteases as a group exhibit a tremendous range of target specificity. However, some members of the family recognize and cleave a narrow range of target sequences and are limited in vivo to hydrolysis of essentially one type of target. An example is enteropeptidase, which is highly specific for pancreatic trypsinogens. Some enzymes, like activated pancreatic trypsin itself, are comparatively omnivorous, hydrolyzing the peptide bond of a broad range of peptides and proteins at sites containing basic amino acids. Other serine proteases cleave targets after aromatic, neutral aliphatic, or acidic residues, but mammalian serine proteases with tryptic specificity are particularly numerous and variable in form and function. These include many familiar proteases with roles in digestion, hemostasis, fibrinolysis, and activation of complement (2). One of the more intriguing subgroups of tryptic serine proteases includes prostasin (3)(4)(5), testisin (6 -8), and ␥-tryptase (9,10). These enzymes are tryptic in specificity (i.e. prefer arginines and lysines in target peptides) and are synthesized with a distinctive carboxyl-terminal peptide or glycosylphosphatidyl inositol membrane anchor. Subsequently, they may be released from their anchor and secreted. The genes of these three enzymes share an idiosyncratic organization of introns and exons and reside on the short arm of chromosome 16 (5,10,11). However, they differ widely in dominant tissue pattern of expression: i.e. kidney and prostate (prostasin) (3,4,12), eosinophils, testicular germ cells and sperm (testisin) (6,8,13), and airway and gut mast cells (␥-tryptase) (9,10). The functions of these proteases are being actively investigated. In the case of prostasin, one likely role that has emerged is regulation of transmembrane ion flux via epithelial sodium channels (14). This non-classic regulatory role for one member of the prostasin subgroup of tryptic mammalian serine proteases hints that we can expect unconventional roles for other members of the subgroup.
This laboratory's interest in ␥-tryptase and prostasin (10) led us to seek genes and transcripts encoding related enzymes in the human genome. As detailed below, our search identified a new family member, which we term "pancreasin" because it appears to be predominantly transcribed by pancreatic tissue as well as by a cell line derived from pancreatic ductal epithelium. The pancreasin gene shares the idiosyncratic gene structure of human prostasin, testisin, and ␥-tryptase and resides like the others on chromosome 16p. Furthermore, recombinant expression reveals that it is a catalytically competent, tryptic peptidase, and proteinase. However, its substrate preferences and inhibitor profile are unique and, unlike its closest relatives, it is synthesized and secreted without a membrane anchor. The distinct patterns of expression, catalytic, and structural features predict that pancreasin's functions are distinct from those of its closest known relatives. The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM

MATERIALS AND METHODS
Data Base Screening-Human ␥-tryptase and prostasin cDNA sequence and Basic Local Alignment Search Tool (available at www. ncbi.nlm.nih.gov) algorithms were used to query human expressed sequence tag (EST) 1 and genomic sequence databases in GenBank TM . Iterative searches using identified individual human EST sequences were used to confirm and extend sequence derived from a given "hit" and to arrive at a consensus sequence. Predicted human cDNAs corresponding to a novel ␥-tryptase/prostasin homologue identified in this manner were used to interrogate non-human EST databases to identify murine homologues.
Amplification and Cloning of Human and Mouse cDNAs-Human pancreasin DNA sequence predicted from EST and genomic sequence was used to design PCR primer pairs, which were used to screen human tissue cDNA preparations for transcripts of the pancreasin gene. Selected amplimers were purified and sequenced by the general methods described in prior work (10) to confirm the identity of the fragments. A rapid amplification of cDNA ends (RACE) approach (15) was used to obtain cDNA encoding additional 3Ј protein-coding sequence of pancreasin. This was then used to design PCR primers (5Ј-CCCAGCCAG-GCCTGAGGACATGAGGCGGCC and 5Ј-AGGGTATTTGAGAGGG-GAGGAAG) bracketing the full protein-coding sequence. With these primers, a 1046-bp pancreasin cDNA was amplified from human placental cDNA (Clontech, Palo Alto, CA), gel-extracted, cloned into pCR2.1 vector (Invitrogen, Carlsbad, CA), and sequenced. Determination of pancreasin cDNA sequence permitted establishment of intronexon splice sites in genomic DNA encoding the pancreasin gene and generation of specific DNA probes for blotting studies. Similar approaches were used to obtain cDNA encoding mouse pancreasin. A PCR primer pair (5Ј-ATGAGGCAGCCCCACATCGCTGC and 5Ј-GCGGC-CGCCTAGACGATCCTGAGCAGCAGTG) predicted from mouse 5Ј-and 3Ј-ESTs was used to amplify a 987-bp cDNA encoding a 328residue, mouse prepropancreasin coding sequence, including a carboxyl-terminal extension not predicted by the human cDNA. This cDNA was obtained from reverse-transcribed mRNA from the urinary bladder of an adult C57BL/6 mouse.
DNA and Protein Sequence Comparisons-DNA sequencing was conducted by University of California at San Francisco's Biomolecular Resource Center using standard dideoxy techniques. DNA translation, multiple sequence alignment and dendrograms were generated using MacVector software (Oxford Molecular, Campbell, CA).
Molecular Modeling-A homology model of the pancreasin catalytic domain was constructed in part assisted by an automated protein modeling tool and server (Swiss PDB Viewer and Swiss-Model, respectively) (16). Propeptide sequence and the carboxyl-terminal 11 residues were excluded from the model. Coordinates of the crystal-derived threedimensional structure of human ␤II-tryptase (Protein Data Bank number 1AOL) (17), which is pancreasin's closest relative for which diffraction data are available, served as template for the model, which was optimized by idealizing bond geometry and removing unfavorable contacts.
Gene Structure and Chromosomal Mapping-Pancreasin cDNA was used to query GenBank TM genomic sequence databanks to identify genes with exons matching predicted cDNA sequence. Intron-exon splice junctions in identified genomic sequence were established using open reading frames and cDNA alignments by application of the "5Ј-GT . . . AG-3Ј" rule for initiating and ending introns, as in prior work from this laboratory (10,18). Identified genomic sequence was mapped to a specific human chromosomal region through LocusLink (available at www.ncbi.nlm.gov/LocusLink).
mRNA Blotting-To generate a pancreasin-specific probe, a 440-bp fragment of pancreasin cDNA was obtained from human pancreatic cDNA by reverse transcriptase-PCR using the following primer pair: 5Ј-GCAAAGACACCGAGTTTGGCTAC and 5Ј-AGGGTATTTGAGAGG-GGAGGAAG. Blots containing purified, electrophoresed mRNA from a variety of human tissues (Clontech) were hybridized with the 32 Plabeled, 440-bp pancreasin cDNA fragment, then subjected to autoradiography. The same blots were stripped and probed with radiolabeled ␥-actin to control for differences in mRNA loading.
Antibody Generation-Rabbit polyclonal antisera were raised against synthetic peptides based on portions of predicted amino acid sequence corresponding to hypothesized catalytic domain surface loops (see results under "Molecular Modeling"). Two peptides were synthesized (CRNTSETSLYQVLLG and CGYQKPTIKNDMLCA) containing residues 78 -91 and 202-214, respectively, of prepropancreasin. Both peptides were conjugated via the amino-terminal cysteines to keyhole limpet hemocyanin and injected into rabbits. Resulting antisera were screened and titered by enzyme-linked immunoadsorbent assay. Peptide synthesis, conjugation, immunizations, bleeding, and titering assays were conducted by GeneMed Synthesis (South San Francisco, CA). The IgG fraction of rabbit immunoglobulins was purified from delipidated antisera on a HiTrap Protein A HP column (Amersham Biosciences, Piscataway, NJ). Column-bound IgG was eluted using glycine-HCl (0.1 M, pH 2.7). Cell Culture-The human pancreatic ductal carcinoma cell line HPAC (19) was obtained from the American Type Culture Collection (Manassas, VA) and cultured according to the vendor's recommendations in medium containing 95% of a 1:1 mixture of Dulbecco's modified Eagle's medium and Ham's F-12 medium with 1.2 g/liter NaHCO 3 , 15-mM HEPES, 2 mg/liter insulin, 5 mg/liter transferrin, 40 g/liter hydrocortisone, and 10 g/ml epidermal growth factor, and 5% fetal bovine serum. Chinese hamster ovary (CHO) cells were grown in Ham's F-12 medium supplemented with 10% fetal bovine serum.
Immunocytochemical Analysis of Pancreatic Ductal Carcinoma Cells-HPAC cells were harvested by trypsinization, washed and suspended in PBS, and centrifuged onto glass slides. Slides were air-dried, immersed in methanol followed by acetone (20 min at Ϫ20°C for each solvent), rinsed with PBS, then incubated for 1 h with blocking solution containing 5% nonfat dry milk, 3% normal goat serum, 0.1% Triton X-100, and 1% glycine in PBS. Blocked slides were incubated overnight at 4°C with various dilutions of rabbit preimmune IgG or anti-pancreasin IgG, washed with PBS containing 0.05% Tween-20, incubated for 1 h with fluorescein-conjugated goat anti-rabbit IgG (Vector Laboratories, Burlingame, CA), and washed again with PBS/Tween 20. Slides were coverslipped in the presence of Vectashield medium (Vector Laboratories) and imaged by fluorescence microscopy.
Expression of Recombinant Pancreasin in CHO Cells-Pancreasin cDNA cloned into pCR 2.1 TOPO T/A vector served as a template for further constructs. Pancreasin lacks a traditional nucleotide sequence (20) bracketing the initiator methionine ATG. Therefore, to boost the prospects of heterologous expression in CHO cells, the wild-type sequence 5Ј-GACATGAA was replaced with optimized sequence 5Ј-GCC-ATGGG and incorporated into a PCR forward primer (5Ј-ACAACTAA-TTATTCGAAACGAGGAATTCGCCATGGGGCGGCCGGCGGCGGTG-CCG) into which an EcoRI restriction site also was introduced to facilitate further cloning. The 3Ј region of the pancreasin cDNA was also modified to encode a carboxyl-terminal histidine fusion tag to ease purification. This modification was achieved by replacing the native stop codon with 9 histidine codons followed by a new stop codon and an introduced NotI restriction site in a reverse primer (5Ј-GTTCGGGCC-CAAGCTGGCGGCCGCTCAGTGATGATGGTGATGATGGTGATGAT-GCTTCTGGCCGCCCAACCTCG). Pancreasin cDNA was amplified by PCR from the pCR2.1-pancreasin template with these modified primers using the following conditions: 95°C for 10 min, 95°C for 30 s, 60°C for 30 s, and 72°C 1 min, for 35 cycles. The resulting His 9 -tagged pancreasin amplimer was trimmed with EcoRI and NotI and ligated into similarly restricted pcDNA3.1 (Invitrogen). In preparation for transfection with pcDNA3.1-pancreasin, CHO cells were plated in a six-well dish at a density of 2.5 ϫ 10 5 cells per well. The cells were transfected by exposure to 5 g of plasmid DNA per well plus LipofectAMINE 2000 (Invitrogen), according to the manufacturer's protocol. After 48 h of recovery, cells were aliquoted into 10-cm dishes and cultured for 1 week in the presence of 400 g/ml G418 (Calbiochem, San Diego, CA) to select for transfected cells. Surviving colonies were pooled and incubated overnight in 175-cm 2 flasks without G418 in the same medium, which was then exchanged for low protein Opti-MEM I medium (Invitrogen). After 3 days, supernatants were harvested and used for purification of recombinant, His 9 -tagged pancreasin.
Purification of Recombinant Pancreasin Expressed by CHO Cells-Medium conditioned by pancreasin-transfected CHO cells was dialyzed against PBS. Imidazole was added to a concentration of 10 mM, and the resulting mixture was shaken overnight at 4°C with a slurry of nickelnitrilotriacetic acid-agarose beads (Qiagen, Valencia, CA). After washing and equilibration with PBS containing 10 mM imidazole, beads were poured into a column and washed first with 10 column volumes of PBS containing 10 mM imidazole and 0.3 M NaCl and then successively with three column volumes of PBS/0.3 M NaCl containing 30 and 100 mM imidazole, respectively. Residual bound protein was eluted from the 1 The abbreviations used are: EST, expressed sequence tag; NA, nitroanilide; DISP, distal intestinal serine protease; CHO, Chinese hamster ovary; HPAC, human pancreatic adenocarcinoma; RACE, rapid amplification of cDNA ends; UTR, untranslated region; PBS, phosphate-buffered saline; rhpancreasin, recombinant human pancreasin.
beads with PBS/0.3 M NaCl containing 1 M imidazole. Aliquots of eluted fractions were assayed for pancreasin immunoreactivity and peptidase activity and subjected to SDS-PAGE.

RESULTS AND DISCUSSION
cDNA and Deduced Amino Acid Sequence of Prepropancreasin-Screening of human and rodent EST databases in Gen-Bank TM revealed several cDNAs encoding fragments of human (e.g. AI272325, AA321681, and AA368960) and murine (e.g. AI070303, BB627930, and BB11542) homologues of prostasin and ␥-tryptase that have not been described previously. Human EST-based primers were used to obtain more complete protein coding sequence of a cDNA encoding a novel protein, which we term pancreasin based on the features of the predicted translation product, as described below. Comparisons with partial sequence AX001350 predict that the pancreasin transcript contains a 5Ј-untranslated region (UTR) of at least 198 nucleotides. 3Ј-RACE reveals a 3Ј-UTR of 571 nucleotides containing a polyadenylation signal and poly(A) tail (Fig. 1). The cDNA encodes a predicted 290-amino acid preproprotease and is highly similar to the sequence of a predicted gene product termed marapsin (GenBank TM AJ306593). The deduced amino acid sequence of prepropancreasin/marapsin begins with a typical 22-amino acid signal peptide, which predicts that the nascent protein is directed initially to the endoplasmic reticulum and, like most serine proteases, is secreted outside of the cell. Indeed, as discussed below, the behavior of recombinant prepropancreasin expressed in CHO cells supports this FIG. 1. Human pancreasin predicted primary structure and post-translational processing. A, pancreasin cDNA and deduced amino acid sequence, beginning with the predicted initiator methionine and ending with the poly(A) tail. Predicted signal and pro (activation) peptides are italicized and underlined, respectively. The "catalytic triad" residues common to all active serine proteases are boxed. The 3Ј-UTR continues after the stop codon until the polyadenylation site, which is preceded by a conventional polyadenylation signal (underlined). B, compares results of Janin hydrophobicity analysis (as implemented in MacVector) of the 290-residue human prepropancreasin amino acid sequence with that of the predicted 328-residue mouse enzyme. Although both proteases contain an amino-terminal hydrophobic sequence typical of a signal peptide, only the substantially longer mouse protein contains a carboxyl-terminal hydrophobic sequence (arrow). The mouse pancreasin carboxyl-terminal hydrophobic sequence is similar to that found in prostasin and other close relatives. C, the predicted structure of mature, processed human pancreasin. The cDNA sequence predicts that pancreasin is translated initially as a singlechain, 290-amino acid precursor with a signal peptide (Pre), propeptide (Pro), and catalytic domain. We predict that the signal peptide is removed co-translationally in the endoplasmic reticulum, leaving a proenzyme, which is activated subsequently by hydrolysis of the propeptide segment at Arg-34, leaving a 256-residue catalytic domain (heavy chain) that remains attached to the propeptide segment (light chain) via a disulfide linkage involving Cys-26 and Cys-144, as shown. The locations of other Cys-Cys pairs are shown, by analogy to pairings involving homologous cysteines in trypsin. Consensus N-linked glycosylation sites are found in two positions, Asn-55 and Asn-79. Also shown are positions of the "catalytic triad" residues found in all active serine proteases (His-75, Asp-124, and Ser-229, corresponding to His-57, Asp-102, and Ser-195 using standard chymotrypsinogen numbering) and specificity-determining Asp-223, which is characteristic of trypsin-family serine proteases with specificity for lysine and arginine in peptide and protein targets.
prediction. Immediately following the signal peptide is a 12residue pro-(activation) peptide ending in Arg, followed by a 256-amino acid serine protease catalytic domain. The aminoterminal residue of the predicted mature protease after propeptide hydrolysis at Arg-34 is methionine, which appears to be unique among trypsin family serine proteases, the great majority of which have an isoleucine at this position. The aminoterminal residue dives into the hydrophobic interior of mature proteases, allowing the positively charged ␣-amino moiety to form a salt bridge with the negatively charged carboxylate side chain of a highly conserved aspartate (residue 228 of prepropancreasin), thereby bringing the residues involved in catalysis into productive alignment. Studies of trypsin involving mutation of the residue equivalent to pancreasin's Met-35 suggest that activity of the mature enzyme is preserved with a variety of amino acids containing aliphatic side chains (i.e. isoleucine, valine, and alanine) at that position, although methionine itself was not examined (22). Our finding that recombinant pancreasin is catalytically active as a tryptic protease indicates that the amino-terminal methionine side chain is tolerated in the binding pocket, although perhaps with assistance from structural accommodations peculiar to pancreasin. Methionine's presence in this critical position could render pancreasin susceptible to oxidative modification to the sulfoxide, especially in the zymogen form, in which the Met-35 side chain should be more exposed at the protein surface than in the enzyme's mature, active conformation.
The primary structure of pancreasin's prosequence and proximal catalytic domain, including Arg-34 and idiosyncratic Met-35, is supported by the sequences predicted for chimpanzee and mouse pancreasin (Fig. 2), as deduced from genomic DNA and cDNA, respectively. Furthermore, rat EST BF551850, which encodes the amino terminus of putative rat pancreasin and is 77% identical (49 of 64 amino acid residues) to human pancreasin in the region of overlap, also contains these features. This suggests that mammalian pancreasins are activated by tryptic hydrolysis at Arg-34 and that Met-35, although apparently unique among serine proteases, is a conserved and possibly essential feature of pancreasins. As expected of a catalytically competent serine protease, the pancreasin catalytic domain possesses all three of the essential "catalytic triad" residues (using standard chymotrypsinogen numbering: His-57, Asp-102, and Ser-195) conserved in all serine proteases, as well as an aspartate at the base of the primary specificity pocket in position 189, found in all proteases of tryptic specificity (Fig. 2). This is consistent with pancreasin's observed cleavage site preference for substrates with P1 arginine (see below). Because pancreasin is predicted to be activated by tryptic hydrolysis at Arg-34 and is itself a tryptic enzyme, it may catalyze its own activation. This possibility is consistent with the finding of active pancreasin in medium conditioned by transfected CHO cells. However, the actual site of activation (intracellular versus extracellular) and the mechanism (autoactivation versus other) remain to be established. Pancreasin may be secreted initially as a zymogen, where it also could be activated in the pancreatic ductal lumen by trypsin or glandular kallikrein. The predicted site of propeptide hydrolysis at Arg-34 is not preceded by the series of aspartate residues required for recognition by enteropeptidase, which, therefore, is unlikely to activate propancreasin. The secreted material would be in a position to interact with other proteins in pancreatic secretions and with potential protein targets on the apical surface of epithelial cells lining the pancreatic ducts.
Two consensus N-linked glycosylation sites, which lie in the catalytic domain (see Figs. 1 and 2), predict that the mature, active enzyme is glycosylated. Results of SDS-PAGE and immu-noblotting of recombinant pancreasin support this prediction, as discussed below. The pancreasin cDNA open reading frame predicts 11 cysteines in the preproenzyme, one of which is in the signal peptide and therefore will be unavailable to participate in disulfide linkages in the mature protein. Based on alignments with trypsin and other serine proteases with Cys-Cys pairings established in crystal-derived tertiary structures, each pancreasin Cys can be paired with another. Thus, no unpaired cysteines are expected to be available to form intermolecular disulfide linkages. By analogy to chymotrypsinogen, Cys-26 of the propeptide is linked to Cys-144 in the catalytic domain. This predicts that the propeptide of pancreasin, like that of mature, active chymotrypsin, remains attached to the catalytic domain after hydrolysis of Arg-34. Predicted specific Cys-Cys pairings for pancreasin are shown in Fig. 1C. Chimpanzee pancreasin cDNA and amino acid sequence was deduced from GenBank TM -deposited draft genomic sequence (AC097329.1) and is 98% identical in amino acid sequence to the human preproenzyme. Mouse pancreasin amino acid sequence was deduced from cloned and sequenced cDNA and is 80% identical to human pancreasin in overlapping catalytic domain. Amino acids identical in all six proteases are marked with an asterisk (*); residues that are similar but not identical in the proteases are marked with a period (.). The predicted amino terminus of the mature catalytic domain, after tryptic hydrolysis at an arginine residue conserved in all of these proteases, is marked with a plus sign (ϩ). Predicted N-glycosylation sites in pancreasins are underlined. Residues constituting the "catalytic triad" common to all serine proteases are in boldface. The key aspartate conferring tryptic specificity is underlined and in boldface. Note that mouse pancreasin, as well as the two frog proteases and ␥-tryptase, contain hydrophobic, carboxylterminal extensions compared with the primate pancreasins, which therefore appear to differ from the other enzymes in not being synthesized in a membrane-anchored form. Xepsin's carboxyl-terminal extension has been truncated for clarity.
The prepropancreasin open reading frame ends in a stop codon in a position that is 11 residues beyond that of the corresponding carboxyl-terminal region of soluble ␤-tryptases (23). However, this carboxyl-terminal extension is much shorter and less hydrophobic (see Fig. 1B) than the putative membrane-anchoring carboxyl-terminal domains of otherwise similar channel-activating protease (24), prostasin (4), testisin (6), ␥-tryptase (10), and distal intestinal serine protease (25). Hydropathy analysis (Fig. 1B) suggests that the pancreasin carboxyl-terminal extension is too short and hydrophilic to form a membrane-spanning helix. Our finding of secretion of pancreasin from CHO cells further supports the predicted lack of a membrane-anchoring segment in the human enzyme. Interestingly, the predicted carboxyl-terminal extension of mouse prostasin (see Fig. 2) is 35 amino acid residues longer than that of the human enzyme. Hydropathy analysis (Fig. 1B) predicts that the mouse extension does form a transmembrane anchor. Similarly, the amino acid sequence deduced from a set of overlapping rat ESTs (AI070303, AI716503, and AI575237) encoding the carboxyl-terminal portion of putative rat pancreasin is 78% identical (87 of 111 of overlapping amino acid residues) to human pancreasin in this region. Nonetheless, like the mouse sequence, it contains a hydrophobic open reading frame that extends 35 amino acid residues beyond that of the human sequence. Thus, pancreasins in rodents may contain a membrane anchor, even if primate pancreasins do not. The presence of a membrane anchor could greatly influence protease function by limiting the spectrum of targets to proteins that are in the plasma membrane or directly in contact with it. Without an anchor, human pancreasin may diffuse away from its cell of origin to reach more remote targets. In this regard it should be noted that some proteases, e.g. prostasin, thought to be synthesized initially with a membrane anchor subsequently might be solubilized by cleavage of the anchor at the membrane surface (4).
Relationship to Other Serine Proteases-Searches of protein sequence databases with full-length human pancreasin deduced as the query sequence reveal homology with several published and partly characterized proteins, the most closely related of which are the Xenopus epidermis-specific protease xepsin (26) and embryonic serine protease-1 (27), mammalian prostasins (4), Xenopus channel-activating protease (24), mouse distal intestinal serine protease (DISP) (25), and mammalian ␥-tryptases (9, 10). As shown in Fig. 3, dendrograms prepared from alignments of pancreasin catalytic domain alone (to avoid the biasing effects of comparing proteases with and without available preprosequence and carboxyl-terminal extension) continue to reveal strong homology with xepsin and embryonic serine protease, which are of unknown function in frogs. However, the most closely related catalytic domain is that of brain-specific protease-2, a partially sequenced serine protease of unknown function from rat hippocampus (28). Rat brain-specific protease-2 is highly similar to the predicted protein product of an uncharacterized human gene (SP001LA, predicted from genomic sequence; GenBank TM accession number AC003965), which is distinct from the pancreasin gene, but which also maps to chromosome 16. Furthermore, as noted, the partial sequence of a much more closely related rat gene product is predicted from EST libraries. Thus, pancreasin and brain-specific protease genes are not orthologous. However, it is possible that xepsin or embryonic serine protease-1 is pancreasin's orthologue in frogs. Despite a number of shared features (including gene structure, as discussed below), the ␥-tryptases, DISP, channel-activating proteases, prostasins, and testisins are less closely related to pancreasin and therefore are less likely to serve similar functions.
Molecular Model-The homology model constructed from human ␤II-tryptase as a starting point is shown in Fig. 4. This model predicts that the topography of charged and uncharged FIG. 3. Dendrogram of pancreasins and related proteases. The amino acid sequence of the catalytic domains of human, chimpanzee, and mouse prostasin and its closest relatives were subjected to tree analysis using the unweighted pair group with arithmetic mean multiple sequence alignment algorithm in MacVector 7.1. Prepropeptides and carboxyl-terminal extensions were excluded from the alignment to limit distortions in phylogenetic distance created by variations in length of sequence on either side of the catalytic domain. The length of each branch of the tree is proportional to the fraction of mismatched amino acids in pairs of aligned sequences. The pancreasin branch is depicted in heavy black lines. The nearest relatives to pancreasins are rat brain-specific protease (BSP) 2 (an uncharacterized, partially sequenced gene product of unknown function), two additional Xenopus proteases, xepsin, and embryonic serine protease (ESP)-1. The "tryptase" group (which includes mouse distal intestinal protease, DISP, as well as ␣-, ␤-, and ␥-tryptases) forms a separate branch, as does the prostasins and testisins. The sequence of human pancreatic trypsin is included as an outlier from this otherwise fairly closely related collection of proteases. Pancreasins are unlikely to be orthologues of rat BSP-2 because: (a) the magnitude of mismatch is too great, (b) ESTs suggest the existence of rat proteins even more closely matched to Pancreasin, and (c) GenBank TM -deposited human genomic sequence contains a gene that is a better candidate as a BSP-2 orthologue. The GenBank TM accession numbers of the sequences used to generate this tree are as follows: human pancreasin (AY030095), chimp pancreasin (AC097329), mouse pancreasin (BB627930 and BB11542), rat BSP-2 (AJ005642), frog xepsin (AB018694), frog embryonic serine protease, ESP-1 (AB038496), human ␥-tryptase (AF191031), mouse ␥-tryptase (AF175760), mouse DISP (AJ243866), human ␣-tryptase (M30038), human ␤Itryptase (M33491), mouse tryptase-1/MCP-6 (M57626), human prostasin (L41351), mouse prostasin (BC003851), frog channel-activating protease, CAP (AF029404), human testisin (AF058300), mouse testisin (AY005145), and human trypsin (M22612). amino acids in the vicinity of the substrate binding and catalytic sites is unique compared with that of tryptase and other serine proteases. Because binding of potential substrates and inhibitors involves contacts with amino acid side chains in this region, the differences between pancreasin and other proteases predict differences in substrate specificity and inhibitor susceptibility. On the other hand, the position of pancreasin Asp-223 in the model predicts that the enzyme's primary specificity (i.e. its preference for the P1 residue on the amino-terminal side of the scissile bond) is for basic residues, as was found to be true of the recombinant enzyme (see below). The model also shows that both consensus N-linked carbohydrate attachment sites lie fairly close to the binding site of the PЈ (carboxylterminal) side of peptide substrates. Therefore, attached sugars may narrow the spectrum of potential substrates and inhibitors by impeding access to the active site. Due to a modest excess of amino acids with acidic side chains over those with basic side chains, the pancreasin catalytic domain is predicted to be acidic with a net charge at pH 7 of approximately Ϫ3, not counting any additional negative charge contributed by Nlinked carbohydrates. However, if the complete propeptide (which contains four basic residues) remains attached to the mature enzyme, the net charge of the two-chain complex is ϩ1, and thus is slightly basic. Although the mature enzyme is not predicted to be strongly cationic, the model suggests that there are patches of positive charge that could bind to polyanions such as heparin and related glycosaminoglycans and proteoglycans.
Location and Organization of the Pancreasin Gene-Interrogations of GenBank TM with query sequences based on pancreasin cDNA identified highly homologous sequences in a 40.2-kb cosmid clone of human genomic DNA (accession number AC004036). This clone is part of a contig localizing to chromosome 16p, the same chromosomal arm containing ␣-tryptase (TPS1), ␤-tryptases (e.g. TPSB1), ␥-tryptase (TPSG1), and testisin (PRSS21) genes (7,9,10,18,29). The pancreasin gene appears to reside on the centromeric side of the tryptase locus and on the telomeric side of the testisin locus (i.e. between the two), although there are persistent ambiguities in mapping data in this region. The predicted organization of the pancreasin gene is shown in Fig. 5 and compared with that of related and Ser (S) catalytic triad residues are indicated. The phase of each intron (0, I, or II) is shown. Note the similarity in phase and placement of introns in pancreasin, prostasin, testisin, and ␥-tryptase genes, each of which contains a preprosequence divided among three exons, which is a distinctive feature of this group of protease genes, but not in genes encoding ␣/␤-tryptases. Pancreasin's first, third, and fifth introns are large compared with the other genes. The third and fifth introns contain an Alu-type repetitive element not present in the corresponding intron in the other genes. The prostasin, testisin, and ␥-tryptase genes each contain an extended 3Ј-open reading frame encoding a putative transmembrane segment and small cytoplasmic tail. This transmembrane segment is not present in human pancreasin or ␣/␤-tryptases. These findings suggest close evolutionary relationships among these genes.
FIG. 6. Blotting of tissue mRNA. Electrophoresed poly(A) mRNA from multiple human tissues was transferred to nitrocellulose, hybridized at high stringency with a radiolabeled probe prepared from a 440-bp portion of pancreasin cDNA, then subjected to autoradiography (upper panels). After stripping, the blot was hybridized with a radiolabeled actin probe, which serves as a control for mRNA loading and integrity (lower panels). With the pancreasin probe, a major band is seen only in the lane containing pancreatic mRNA, suggesting that the pancreas has high steady-state levels of pancreasin mRNA compared with other tissues. Dual bands of actin hybridization seen in the heart and skeletal muscle lanes are due to hybridization with muscle isoforms of actin.

FIG. 4. Homology model of pancreasin.
A model of the predicted pancreasin catalytic domain was generated starting from the crystallographically derived structure of human ␤II-tryptase. The propeptide and the carboxyl-terminal 11 residues of pancreasin were omitted because they have no counterpart in the tryptase structure. In the model shown, the catalytic triad residues (His-75, Asp-124, Ser-229) in the active site are red, basic residues (lysine and arginine) are blue, acidic residues (aspartate and glutamate) are green, and predicted N-linked carbohydrate (CHO) attachment sites (Asn-55 and Asn-89) are cyan. Note that sugars (not shown) attached to Asn-55 and Asn-89 could influence access and binding of substrates in the active site. The ribbon structure shows side chains of the catalytic triad residues and putative N-linked asparagines. "Front" views show the active site face one, with the extended substrate-binding site oriented roughly vertically. "Side" views depict the active site binding cleft in profile, with the carbohydrate attachment sites to the left. The distribution of surface side chains of basic and acidic residues suggests patches of positive charge, despite an overall excess of residues with acidic side chains.
proteases. Its most distinctive feature is distribution of the "prepro" coding segments among three different exons, including exon 2, which is only 27 bp. This pattern, including the phase and placement of the first intron and the small size of the second exon, is described only in genes encoding pancreasin's close relatives, such as prostasin (5), ␥-tryptase (10), testisin (7), and DISP (25). Pancreasin's first, third, and fifth introns are large compared with those of its relatives, due, in part, to insertion of Alu repetitive sequences not present in the other genes (Fig. 5).
Tissue Expression of Pancreasin mRNA-As shown in Fig. 6, hybridization of a pancreasin-specific probe with blotted mRNA from multiple tissues reveals a strong signal from the pancreas but not from any other tissue surveyed. However, more sensitive reverse transcriptase-PCR-based screening identifies the predicted 440-bp transcript in several other tissues, of which lung and placenta are strong (Fig. 7). The same PCR primers also yield pancreasin-derived amplimers from HPAC pancreatic carcinoma cells (Fig. 7). However, cDNA from the MRC5 line of human fibroblasts does not yield the amplimer (not shown), reinforcing the selectivity of pancreasin expression predicted by the mRNA blotting and reverse transcriptase-PCR findings.
Expression of Pancreasin Protein in Pancreatic Carcinoma Cells-As shown in Fig. 8, antibodies raised against synthetic pancreasin peptides reveal predominantly cytoplasmic reactivity in HPAC cells subjected to immunocytochemical analysis. Consistent with predictions from hydropathy analysis showing that the mature protein is not membrane-anchored, there is no strong staining of the cell surface. Some of the cells exhibit eccentric, perinuclear immunoreactivity, consistent with the presence of pancreasin in the Golgi apparatus or endoplasmic reticulum. Thus, pancreasin transcripts are translated into protein in a cell line derived from the tissue in which pancreasin expression is highest, based on the survey of tissue mRNA levels shown in Fig. 6.
Properties of Recombinant Human Pancreasin-As shown by the immunoblot in Fig. 9, CHO cells transfected with His 9tagged human pancreasin express a 40-to 41-kDa protein that binds strongly to our polyclonal anti-serum raised against synthetic pancreasin peptides. This immunoreactive band was detected in pancreasin-transfected cells but not in control cells transfected with empty vector (not shown), suggesting that CHO cells do not natively express detectable amounts of pancreasin. Immunoreactive pancreasin is detected primarily in conditioned medium rather than in cell extracts, suggesting that recombinant human pancreasin is secreted by transfected CHO cells and not stored. The ϳ9-kDa reduction in size of the recombinant pancreasin band achieved by incubation with peptide N-glycosidase F indicates that pancreasin is N-glycosylated, likely at both of the predicted sites given the magnitude of the size reduction. The size of native pancreasin (without the His 9 tag) should be slightly smaller than that of the recombi- FIG. 8. Immunocytochemical analysis of pancreasin expression in pancreatic ductal carcinoma cells. Cultured HPAC cells were harvested, cytospun onto slides, and hybridized with 1:200 dilutions of pre-immune or anti-pancreasin polyclonal rabbit IgG, followed by incubation with fluorescein-conjugated anti-rabbit IgG secondary antibody. Images obtained by fluorescence microscopy of cells incubated with pre-immune IgG and anti-pancreasin IgG are shown in A and B, respectively (40ϫ objective). Images were captured and processed with identical parameters. Note the strong cytoplasmic fluorescence obtained using anti-pancreasin antibody. These findings support HPAC expression of pancreasin that is not membrane-anchored or otherwise attached to the cell surface. FIG. 7. Amplification of tissue-specific cDNA. Human cDNA from the indicated range of cells and tissues was amplified by PCR using primers (5Ј-GCAAAGACACCGAGTTTGGCTAC and 5Ј-AGGG-TATTTGAGAGGGGAGGAAG) based on pancreasin exons 5 and 6. The amplification reaction was designed to cross intron 5 so that cDNA-derived amplimers are distinguished from products derived from any contaminating genomic DNA. The expected sizes of cDNA and genomic DNA amplimers are 440 and 1154 bp, respectively. Consistent with results of mRNA blotting in Fig. 4, the most intense bands were obtained from cDNA from human pancreas and from the human pancreatic carcinoma cell line, HPAC (left lane). However, strong bands also were obtained from lung and placenta. Weak bands were obtained from brain and liver. The right-most lane shows the major ϳ1.2-kb band expected from genomic DNA. Amplimers from lung and pancreas were isolated and sequenced, confirming their identity as pancreasin. Thus, HPAC cells actively transcribe the pancreasin gene, as do several tissues in addition to pancreas. Substrate preferences of pancreasin in comparison with trypsin are shown in Fig. 10A. Based on hydrolysis of a sampling of peptidyl-NAs, pancreasin is more selective than trypsin in the types of tryptic substrates it hydrolyzes. However, like trypsin, it has no chymotryptic or elastolytic activity. The best substrate is tosyl-Gly-L-Pro-Arg-NA, which is also a good substrate for human ␤-tryptase. Unlike ␤-tryptase and trypsin (30), however, pancreasin has little activity toward tosyl-Gly-L-Pro-Lys-NA, which differs from tosyl-Gly-L-Pro-Arg-NA only in the P1 residue. Therefore, pancreasin appears to possess a strong preference for P1 Arg over Lys. Pancreasin also appears to prefer peptide rather than mono-amino acid amides, because it has less activity toward benzoyl-L-Arg-NA. This suggests that pancreasin has an extended binding site available for produc-tive interactions with residues on the amino-terminal side of P1 Arg. Like tryptase, it tolerates (and may actually prefer) substrates with P2 Pro, a somewhat atypical preference among tryptic serine proteases (31). Further analysis is needed to establish subsite preferences for substrate residues P2-P4 as well as for residues on the PЈ side of the scissile bond. Finally, our initial characterization of recombinant human pancreasin suggests that the enzyme does not require calcium or heparin for stability, in contrast to trypsin and tryptase, respectively. Indeed, pancreasin activity was undiminished by incubation for more than 3 h at 37°C.
As shown in Fig. 10B, recombinant human pancreasin rather remarkably resists inactivation by large molecular mass inhibitors such as aprotinin and soybean trypsin inhibitor, which effectively inhibit pancreatic trypsin. Although broad resistance to proteinaceous inhibitors is rare among characterized mammalian serine proteases, it is a feature of ␤-tryptases, which are cousins of pancreasin, as shown in Fig. 3. Human ␤-tryptases resist large inhibitors by forming non-covalently associated, heparin-stabilized oligomers, which compartmentalize active sites within a central pore accessible only to smaller inhibitors (17). At least one relative of tryptase, canine mastin, forges disulfide links between catalytic subunits to stabilize the inhibitor-resistant conformation (32). Based on results of non-reducing SDS-PAGE (not shown), there is no evidence of the formation of intersubunit disulfide links in our preparations of pancreasin. However, the potential role of noncovalent oligomerization in pancreasin's resistance to aprotinin and inhibitors circulating in the bloodstream merits further investigation. Similarly, it will be helpful to identify low molecular weight inhibitors more potent than benzamidine for future pharmacological explorations of pancreasin function. Human pancreasin's resistance to aprotinin and its secretion as a soluble enzyme lessens the likelihood that it plays a channel-activating protease or prostasin-like role in regulating epithelial sodium channel function. This is because functional data from cultured epithelia suggest that the endogenous regulator of Na ϩ flux via the amiloride-sensitive sodium channel is sensitive to aprotinin and other large inhibitors and that a membrane anchor may be required for channel-activating function (24,33,34).
In conclusion, our data reveal the gene, cDNA, predicted protein structure, and activity profile of a novel, secreted serine protease expressed in several tissues but most strongly in the pancreas. Characterization of recombinant pancreasin reveals an active, inhibitor-resistant peptidase with a preference for hydrolysis of peptide substrates after arginine residues.