Molecular Cloning and Expression of Human and Mouse Tyrosylprotein Sulfotransferase-2 and a Tyrosylprotein Sulfotransferase Homologue in Caenorhabditis elegans *

Tyrosine O-sulfation, a common post-translational modification in eukaryotes, is mediated by Golgi enzymes that catalyze the transfer of the sulfuryl group from 3′-phosphoadenosine 5′-phosphosulfate to tyrosine residues in polypeptides. We recently isolated cDNAs encoding human and mouse tyrosylprotein sulfotransferase-1 (Ouyang, Y. B., Lane, W. S., and Moore, K. L. (1998) Proc. Natl. Acad. Sci. U. S. A. 95, 2896–2901). Here we report the isolation of cDNAs encoding a second tyrosylprotein sulfotransferase (TPST), designated TPST-2. The human and mouse TPST-2 cDNAs predict type II transmembrane proteins of 377 and 376 amino acid residues, respectively. The cDNAs encode functionalN-glycosylated enzymes when expressed in mammalian cells. In addition, preliminary analysis indicates that TPST-1 and TPST-2 have distinct specificities toward peptide substrates. The human TPST-2 gene is on chromosome 22q12.1, and the mouse gene is in the central region of chromosome 5. We have also identified a cDNA that encodes a TPST in the nematode Caenorhabditis elegans that maps to the right arm of chromosome III. Thus, we have identified two new members of a class of membrane-bound sulfotransferases that catalyze tyrosineO-sulfation. These enzymes may catalyze tyrosineO-sulfation of a variety of protein substrates involved in diverse physiologic functions.

Tyrosine O-sulfation, a common post-translational modification in eukaryotes, is mediated by Golgi enzymes that catalyze the transfer of the sulfuryl group from 3-phosphoadenosine 5-phosphosulfate to tyrosine residues in polypeptides. We recently isolated cDNAs encoding human and mouse tyrosylprotein sulfotransferase-1 ( Tyrosine O-sulfation is a post-translation modification of membrane and secretory proteins that occurs in all eukaryotic organisms (1-3). Many proteins have been shown to contain tyrosine sulfate. Among these are proteins involved in inflammation (4,5) and hemostasis (6 -12), subcellular matrix proteins (13)(14)(15)(16)(17), and many others (2,3). Tyrosine O-sulfation is known to be important for the biological function of coagulation factors VIII and V (7,8,18,19), P-selectin glycoprotein ligand-1 (4), platelet glycoprotein Ib␣ (9,10), complement factor C4 (5), and hirudin (20). However, a functional role for tyrosine Osulfation has not been established for the majority of the pro-teins known to have this modification.
Tyrosine O-sulfation is mediated by tyrosylprotein sulfotransferase (TPST), 1 which catalyzes the transfer of the sulfuryl group from 3Ј-phosphoadenosine 5Ј-phosphosulfate (PAPS) to tyrosine residue(s) within highly acidic motifs of polypeptides (2,21). Biochemical evidence indicates that the enzyme is a membrane-associated protein with a lumenally oriented active site localized in the trans-Golgi network (22,23). We recently purified a TPST from rat liver microsomes and cloned human and mouse cDNAs that encode this enzyme activity, which we now designate TPST-1 (24). The human and mouse TPST-1 cDNAs encode N-glycosylated proteins of 370 amino acids with type II transmembrane topology and are broadly expressed in mammalian tissues as assessed by Northern blotting. In this paper we report the molecular cloning and expression of human and mouse cDNAs encoding a second mammalian TPST, designated TPST-2, and a TPST from the nematode Caenorhabditi elegans, designated TPST-A.

EXPERIMENTAL PROCEDURES
TPST Assay-TPST activity was determined by measuring the transfer of [ 35 S]sulfate from [ 35 S]PAPS (NEN Life Science Products) to immobilized peptide substrates, as described previously (24). Peptides were synthesized by Biosynthesis, Inc. (Lewisville, TX) and were linked (2.1-3.0 mol/ml resin) via a COOH-terminal cysteine residue to iodoacetamide-activated resin (UltraLink™ Iodoacetyl, Pierce). All assays were performed in duplicate. One unit of activity was defined as 1 pmol of product formed per minute.
Expression of Recombinant TPST-2 in Mammalian Cells-The pcDNA3.1(ϩ) vector (Invitrogen, Carlsbad, CA) was modified for expression of full-length TPST fusion proteins containing an NH 2 -terminal epitope for HPC4, a Ca 2ϩ -dependent monoclonal antibody to protein C, as described previously (24). Full-length human and mouse TPST-2 coding sequences were amplified by Taq polymerase (Promega, Madison, WI) using EST clones 810937 and 569461 as templates, respectively. The primers used were: top strand, 5Ј-CGGGATCCGCGCCTG-TCGGTGCGTA-3Ј, and bottom strand, 5Ј-GGAATTCTGGAAATCACG-AGCTTCC-3Ј. The cycling parameters were 25 cycles of denaturation at 94°C for 30 s, annealing at 55°C for 30 s, and extension at 72°C for 2 min. The polymerase chain reaction introduced a BamHI site in place of the native initiation codon and an EcoRI site after the termination codon. The products were gel purified, ligated into pGEM-T (Promega), and sequenced on both strands. The inserts were excised using BamHI and EcoRI and directionally cloned into unique BamHI and EcoRI sites in the multiple cloning site of the modified pcDNA3.1(ϩ) vector. In the fusion proteins the native initiating methionine is replaced with 15 residues containing the HPC4 epitope (MEDQVDPRLIDGKDP).
The pcDNA3.1(ϩ) vector was also modified for expression of soluble fusion proteins. The HindIII and BamHI fragment in the multiple cloning site of the vector was replaced with a 103-bp double-stranded oligonucleotide (Integrated DNA Technologies, Inc., Coralville, IA) with a 5Ј HindIII half-site and a 3Ј BamHI half-site containing an ideal Kozak sequence followed by the nucleotide sequence encoding the transferrin signal peptide and the HPC4 epitope. cDNAs encoding soluble forms of human and mouse TPST-2 were amplified by Taq polymerase using the full-length cDNAs as templates. The primers used for amplification of soluble TPST-2 were: top strand for human TPST-2, 5Ј-CGGGATCCAGGACAGCAGGTGCTAGAG-3Ј; top strand for mouse TPST-2, 5Ј-CGGGATCCAGGGCAGCAAGTACTGGAG-3Ј, and bottom strand for human and mouse TPST-2, 5Ј-GGAATTCTG-GAAATCACGAGCTTCC-3Ј. The polymerase chain reaction introduced a BamHI site at the 5Ј end and an EcoRI site after the termination codon. The cycling parameters were 25 cycles of denaturation at 94°C for 30 s, annealing at 55°C for 30 s, and extension at 72°C for 2 min. After the polymerase chain reaction both products were gel purified, digested with BamHI and EcoRI, and cloned into unique BamHI and EcoRI sites in the multiple cloning site. In the fusion proteins the native NH 2 -terminal 24 amino acids of TPST-2, including the cytoplasmic and transmembrane domain, were replaced with the transferrin signal peptide (MRLAVGALLVCAVLGLCLA) followed by the 12-residue HPC4 epitope. Thus, the NH 2 terminus of both recombinant soluble enzymes is NH 2 -EDQVDPRLIDGKDPG 25 Q (HPC4 epitope is underlined) after signal peptide cleavage. The predicted molecular masses of soluble human and mouse TPST-2 fusion proteins are 40,940 and 41,152 Da, respectively.
The human embryonic kidney cell line 293-T was grown in low glucose Dulbecco's modified Eagle's medium containing 10% fetal calf serum, 2 mM glutamine at 37°C and 5% CO 2 /95% air. Cells were transfected with empty vector or vector containing cDNAs encoding TPST fusion proteins using LipofectAMINE (Life Technologies, Inc.). Cells extracts and conditioned medium were processed as described previously and stored at Ϫ80°C (24).
Peptide Antibody Production-A peptide corresponding to residues 360 -376 (CGYFQVNQVSTSPHLGSS) of mouse TPST-2 was synthesized on an Applied Biosystems model 431 peptide synthesizer (Tom Zamborelli, Amgen). The peptide was coupled to maleimide-activated keyhole limpet hemocyanin through the added NH 2 -terminal cysteine (underlined) and injected into New Zealand White rabbits (Cocalico Biologicals, Inc. Reamstown, PA). Immune sera were collected and tested by Western analysis of extracts of 293-T cells transfected with mouse and human TPST-1 and TPST-2 cDNAs. The antiserum recognized two closely spaced polypeptides of Ϸ47 kDa in extracts of cells overexpressing full-length mouse and human TPST-2 but not in cells overexpressing mouse or human TPST-1.

Molecular Cloning of Human and Mouse TPST-2-
The nucleotide and predicted amino acid sequences for human and mouse TPST-1 were used to perform reiterative searches of the EST data base using the TBLASTN and BLASTN algorithms. Excluding ESTs that aligned with TPST-1, 17 human and 23 mouse ESTs were identified that were aligned into separate contigs. The human and mouse TPST-2 contigs spanned open reading frames 1131 and 1128 nucleotides in length, respectively. I.M.A.G.E. Consortium cDNA clones (25) were purchased from Research Genetics (Huntsville, AL), and the nucleotide sequences of both strands were determined.
Mouse TPST-2 EST clone 569461 (GenBank TM accession number AA369474) had a 1760-bp insert containing a 156nucleotide 5Ј-untranslated region (UTR), a 1128-nucleotide coding region, and a 476-nucleotide 3Ј-UTR. Human TPST-2 EST clone 810937 (GenBank TM accession number AA459614) had a 1854-bp insert. Alignment of the nucleotide sequences of human EST clone 810937 and mouse EST clone 569461 showed that the open reading frames were 89% identical. However, the alignment indicated that the human clone had a frameshift mutation due to the deletion of a guanosine at nucleotide 1200 that would result in premature termination of translation. This conclusion is supported by the following observations. An independent human TPST-2 EST clone (clone 307478, GenBank TM accession number W21315) was sequenced, and the published sequences of two additional ESTs that align to this region were compared (GenBank TM accession numbers H94110 and AA374022). All three ESTs did not have the frameshift mutation. In addition, a BAC clone (445C9, GenBank TM accession number Z95115) that contains the complete genomic sequence of the human TPST-2 gene (see below) also lacked the frameshift mutation. To construct a full-length human TPST-2 cDNA, the 1087-nucleotide 5Ј end of the EST clone 810937 and the 768-nucleotide 3Ј end of the EST clone 307478 were spliced together by blunt end ligation at a unique Eco47III restriction site. Therefore, the full-length human TPST-2 cDNA is 1855-bp in length and contains a 197-nucleotide 5Ј-UTR, a 1131-nucleotide coding region, and a 527-nucleotide 3Ј-UTR. The sequences surrounding the proposed initiating ATG codons have a purine in position Ϫ3 and a cytosine in position ϩ4, thereby conforming to Kozak consensus features (26). Both cDNAs have a single polyadenylation signal upstream from the beginning of the poly(A) tail.
The human and mouse cDNAs encode proteins of 377 and 376 amino acids with molecular masses of 41,909 Da for the human and 42,064 Da for the mouse protein, respectively. The predicted amino acid sequence of human TPST-2 ( Fig. 1) is 96% identical to that of mouse TPST-2. Kyte-Doolittle hydrophobicity plots of human and mouse TPST-2 (not shown) reveal a 17-residue hydrophobic segment near the NH 2 terminus (27). This segment is preceded by basic residues and is not followed by a suitable signal peptidase cleavage site. This indicates that TPST-2 has type II transmembrane topology. Both polypeptides have two potential sites for N-linked glycosylation and six lumenal cysteines. The amino acid sequences of human and mouse TPST-2 are 67 and 65% identical to human and mouse TPST-1, respectively.
Expression and Characterization of Recombinant TPST-2-Full-length human and mouse TPST-2 were transiently expressed in 293-T cells as HPC4 fusion proteins. Extracts of cells transfected with empty plasmid or plasmid encoding human and mouse fusion proteins were prepared and assayed using a PSGL-1 peptide substrate (QATEYEYLDYDFLPEC). This peptide spans the three potential tyrosine sulfation sites in PSGL-1 (28) and was previously shown to serve as a substrate for TPST-1 (24). The specific activity of mock transfected 293-T cell extracts was 0.06 Ϯ 0.01 units/mg (mean Ϯ S.D., n ϭ 6). When cells were transfected with human or mouse TPST-2 cDNA, the specific activity of 293-T cell extracts increased 112-fold (7.07 Ϯ 1.28 units/mg, n ϭ 3) and 46-fold (2.91 Ϯ 0.86 units/mg, n ϭ 5), respectively. TPST activity was not detectable in culture supernatants of cells transfected with TPST-2 cDNAs. 293-T cells were also transfected with cDNAs encoding soluble forms of human and mouse TPST-2 with NH 2 -terminal HPC4 epitopes. Assays of conditioned media indicated that TPST-2 was efficiently secreted in an active form. Conditioned medium from cells transfected with soluble human TPST-2 fusion protein was analyzed by Western blotting using HPC4 and an antiserum against the COOH-terminal 16 amino acids of TPST-2 (Fig. 2). Both HPC4 and the COOH-terminal peptide antiserum detected two closely space polypeptides of approximately 47 and 44 kDa. This demonstrates that TPST-2 is secreted as two distinct isoforms that are not the result of proteolytic degradation. To determine the structural basis for this heterogeneity, partially purified soluble TPST-2 was either sham-treated or treated with peptide N-glycosidase F and analyzed by Western blotting using the COOH-terminal peptide antiserum (Fig. 2). We observed that enzyme-treated TPST-2 migrated as a single polypeptide with an apparent molecular mass of Ϸ41 kDa. This result demonstrates that soluble TPST-2 is secreted with either one or two N-glycan chains. However, peptide N-glycosidase F-treated TPST-2 still appears somewhat heterogeneous, suggesting that it may contain Olinked glycosylation or perhaps some other post-translational modification.
Substrate Specificity of TPSTs-To determine whether TPST-1 and TPST-2 catalyze sulfation of other substrates, extracts of human TPST-1-and TPST-2-transfected 293-T cells were assayed using peptide substrates modeled on known tyrosine sulfation sites in heparin cofactor II (HCII) and the ␣ chain of the fourth component of complement (C4␣) (12,29). In parallel duplicate assays of extracts from three independent transfections, we observed that TPST-1 efficiently sulfated the PSGL-1, HCII, and C4␣ peptides (Fig. 3). The lower specific activity observed using the HCII peptide may be because it has only a single tyrosine, in contrast to the PSGL-1 and C4␣ peptides, which have three. We observed that the specific activity of extracts of TPST-1-and TPST-2-transfected 293-T cell were comparable using the PSGL-1 peptide as a substrate. In contrast, the specific activity of extracts of TPST-1-transfected cells was 21-fold higher using the HCII peptide as substrate and 9-fold higher using the C4␣ peptide when compared with TPST-2 extracts assayed in parallel. These preliminary experiments suggest that TPST-1 and TPST-2 differ in their speci-ficities toward small peptide substrates in vitro.
Northern Blot Analysis-cDNAs were labeled with [␣-32 P]dCTP using random hexamer priming with Klenow fragment of DNA polymerase I and used to probe Northern blots of poly(A) ϩ RNA (CLONTECH, Palo Alto, CA) as described previously (24). This analysis showed a Ϸ1.8 -2.0 kb TPST-2 transcript in all human and mouse tissues examined (Fig. 4). The larger hybridizing species observed in pancreatic tissue likely represent incompletely processed transcripts.
Chromosomal Localization of the Human and Mouse TPST-2 Genes-Searches of the NCBI data base revealed that sequences matching the human TPST-2 cDNA were located in a human BAC clone (445C9, GenBank TM accession number Z95115). This BAC clone was sequenced at the Sanger Center (Cambridge, UK) and maps to chromosome 22q12.1. Alignment of the cDNA and genomic sequence shows that the TPST-2 gene spans Ϸ63.4 kilobase pairs and contains 7 exons and 6 introns (Fig. 5). The TPST-2 gene is centromeric to two known genes in the BAC clone, ␤ B1-crystallin and ␤ A4-crystallin, and is transcribed from telomere to centromere. Intron 1 is unusually large (Ϸ45.4 kb) and contains a high mobility group-1 pseudogene (30). The coding region spans exons III to VI. The nucleotide sequence at the 5Ј donor and 3Ј acceptor sites of all introns conform to the GT..AG rule (31). There were only three nucleotides in the human TPST-2 cDNA sequence that did not match the published genomic sequence. Two are conservative substitutions in the coding region (C 467 3 G, T 897 3 C), and one is in the 3Ј-UTR (C 1847 3 T), 7 nucleotides 5Ј to the polyadenylation site.
Warden et al. (32) reported the chromosomal mapping of 40 mouse liver cDNA clones by interspecies backcross analysis. One of the mapped EST clones (m1650) was partially sequenced on both strands (GenBank TM accession numbers L11849 and L12133). These sequences are Ͼ95% identical to nucleotides 12-155 and nucleotides 1109 -1544 of the mouse TPST-2 cDNA, respectively. This EST clone defines the D5Ucla3 locus located in the central region of mouse chromosome 5 (Mouse Genome Data Base, The Jackson Laboratories).
Identification of a TPST in C. elegans-Searches of the NCBI data base also identified two overlapping C. elegans EST clones (yk166c1 and yk363g6). These clones were obtained from Dr. Yuji Kohara (National Institute of Genetics, Mishima, Japan), and the nucleotide sequences of both strands were determined. Clone yk166c1 is a full-length cDNA with a 1416-bp insert  2-4). The sample in lane 3 was sham-treated and that in lane 4 was treated with peptide N-glycosidase F, as described previously (24). DF, dye front. comprised of a 54-nucleotide 5Ј-UTR, a 1140-nucleotide coding region, and a 222-nucleotide 3Ј-UTR. The C. elegans cDNA predicts a protein of 380 amino acids that we designate TPST-A. Hydropathy analysis indicates that the protein has type II transmembrane topology (not shown). The polypeptide has one potential N-glycosylation site and five lumenal cysteine residues. Alignment of the predicted amino acid sequence of the C. elegans protein to the human proteins show that it is 54 and 52% identical to human TPST-1 and TPST-2, respectively (Fig. 6). C. elegans TPST-A was expressed as a fulllength protein in 293-T cells from the unmodified pcDNA3.1(ϩ) vector. Transfection of 293-T cells with C. elegans TPST-A cDNA resulted in a 40-fold increase in the specific activity of the cell extracts when compared with mock transfected controls using the PSGL-1 peptide as substrate (n ϭ 2). Searches of the high throughput genomic sequence data base indicates that the TPST-A is located on a yeast artificial chromosome clone mapped to the right arm of chromosome III (Y111B2, Gen-Bank TM accession number Z98857) that is currently being se-quenced at the Sanger Center (33)(34)(35).
Data base searches also revealed a second putative C. elegans TPST gene. This gene is in cosmid F42G9 (GenBank TM accession number U00051) that was sequenced at the Genome Sequencing Center at Washington University (St. Louis, MO) and maps to the left arm of chromosome III. The cosmid contains a predicted open reading frame, designated F42G9.8, which predicts a 359-amino acid type II transmembrane protein. Alignment of TPST-A and F42G9.8 reveals a 39% identity and 62% similarity at the amino acid level. However, it is not known whether F42G9.8 encodes a TPST.

DISCUSSION
Tyrosine O-sulfation is a common post-translational modification in eukaryotes mediated by a Golgi enzyme activity called tyrosylprotein sulfotransferase (TPST). We recently reported the isolation and characterization of human and mouse cDNAs encoding TPST-1 (24). Here we report the isolation and expression of human and mouse cDNAs encoding a second member of the TPST family, TPST-2. The human and mouse TPST-2 cDNAs predict type II transmembrane proteins of 377 and 376 amino acid residues, respectively. The predicted molecular weight of the TPST-2 coding region, in conjunction with the two potential N-glycosylation sites, is consistent with the size of recombinant TPST-2 as assessed by SDS-polyacrylamide gel electrophoresis. When transfected into mammalian cells, human and mouse cDNAs encoding either full-length or soluble forms of TPST induce overexpression of TPST activity in cell extracts and conditioned medium, respectively. These data demonstrate that the cDNAs encode a tyrosylprotein sulfotransferase. We have also identified a cDNA from the nematode C. elegans encoding a 380-amino acid type II protein that induces overexpression of TPST activity when expressed in The alignment of human TPST-1, human TPST-2, and C. elegans TPST-A was produced using the PILEUP program (Genetics Computer Group). Amino acid identities are shaded. Invariant cysteine residues are boxed, and the transmembrane domain regions are underlined. Residues homologous to those involved in co-substrate binding in the estrogen sulfotransferase crystal structure (24,37) are indicated by a dot below the sequence alignment. mammalian cells. In addition, it is likely that F42G9.8 gene product encodes a second C. elegans TPST. Fig. 6 shows the alignment of human TPST-1, human TPST-2, and C. elegans TPST-A. Sequence identity between the three TPSTs is restricted to the COOH-terminal portion of the proteins with very little homology in the NH 2 -terminal Ϸ70 amino acids. The relative positions of all of the intralumenal cysteine residues in TPST-1 are conserved in TPST-2. However, in C. elegans TPST-A the most membrane proximal intralumenal cysteine is absent. Kakuta et al. (36,37) have defined two structural motifs common to most cytosolic and membrane-bound sulfotransferases. In the mouse estrogen sulfotransferase crystal structure these regions are involved in binding of the 5Ј and 3Ј phosphate on the sulfate donor PAPS. These motifs are highly conserved in the three TPSTs (Fig. 6).
Previous work indicates that TPST activity is localized to the trans-Golgi in mammalian cells (22,38,39). Several lines of evidence support the conclusion that TPST-2, like TPST-1, is a membrane-bound Golgi enzyme. TPST-2 is predicted to have type II topology, like most known Golgi sulfotransferases and glycosyltransferases. The presence of N-linked glycans demonstrates that TPST-2, like TPST-1, transits the Golgi compartment. Furthermore, TPST activity was not detectable in culture supernatant of 293-T cells overexpressing full-length TPST-2, suggesting that it is retained in an intracellular compartment. Our previous study demonstrated that TPST activity measured using the PSGL-1 peptide as substrate was detectable only in the microsomal fraction of crude rat liver homogenates. We also provided evidence that the active site of the enzyme(s) were lumenally oriented in microsomes (24). Given that the PSGL-1 peptide is a substrate for both TPST-1 and TPST-2, this would suggest that the enzymes have a similar subcellular localization and active site orientation in microsomal membranes. However, the subcellular localization of TPST-1 and TPST-2 have not been rigorously determined.
TPST-1-and TPST-2-specific transcripts are present in many mammalian tissues. In addition, both transcripts are present in multiple tumor cell lines and in human umbilical cell vein endothelial cells as assessed by Northern blotting. 2 These data suggest that TPST-1 and TPST-2 are coexpressed in many, if not all, mammalian cells. These observations raise the question as to whether the two enzymes are functionally redundant or alternatively whether they have preferred substrates. Our observation that TPST-1 and TPST-2 may differ in their ability to mediate tyrosine O-sulfation of certain small peptide substrates in vitro supports the latter hypothesis. Determination of kinetic rate constants of both enzymes toward various substrates will be required to determine the kinetic basis for these differences. It is not known whether the two enzymes differentially sulfate native protein substrates. Development of expression systems lacking one or both enzymes will be required to address the more complicated question as to whether the two enzymes have distinct macromolecular specificities in intact cells or organisms.
In summary, we have cloned and expressed a second mammalian sulfotransferase and a C. elegans sulfotransferase that catalyze tyrosine O-sulfation. These enzymes may catalyze ty-rosine O-sulfation of a variety of protein substrates involved in diverse physiologic functions.