Molecular Cloning and Functional Expression of Two Splice Forms of Human N-Acetylglucosamine-1-phosphodiester α-N-Acetylglucosaminidase*

We have isolated and sequenced human cDNA and mouse genomic DNA clones encodingN-acetylglucosamine-1-phosphodiester α-N-acetylglucosaminidase (phosphodiester α-GlcNAcase) which catalyzes the second step in the synthesis of the mannose 6-phosphate recognition signal on lysosomal enzymes. The gene is organized into 10 exons. The protein sequence encoded by the clones shows 80% identity between human and mouse phosphodiester α-GlcNAcase and no homology to other known proteins. It predicts a type I membrane-spanning glycoprotein of 514 amino acids containing a 24-amino acid signal sequence, a luminal domain of 422 residues with six potential N-linked glycosylation sites, a single 27-residue transmembrane region, and a 41-residue cytoplasmic tail that contains both a tyrosine-based and an NPF internalization motif. Human brain expressed sequence tags lack a 102-base pair region present in human liver cDNA that corresponds to exon 8 in the genomic DNA and probably arises via alternative splicing. COS cells transfected with the human cDNA expressed 50–100-fold increases in phosphodiester α-GlcNAcase activity proving that the cDNA encodes the subunits of the tetrameric enzyme. Transfection with cDNA lacking the 102-base pair region also gave active enzyme. The complete genomic sequence of human phosphodiester α-GlcNAcase was recently deposited in the data base. It showed that our cDNA clone was missing only the 5′-untranslated region and initiator methionine and revealed that the human genomic DNA has the same exon organization as the mouse gene.

The biosynthesis of the mannose 6-phosphate recognition signal on the oligosaccharides of lysosomal acid hydrolases occurs in the Golgi apparatus and is catalyzed by the sequential action of two enzymes. The first step is the addition of N-acetylglucosamine-1-P to the C-6 hydroxyl group on selected mannose residues in the high mannose oligosaccharides of lysosomal enzymes that serve as substrates for UDP-N-acetylglucosamine:lysosomal enzyme N-acetylglucosamine-1-phos-photransferase. The second step is catalyzed by N-acetylglucosamine-1-phosphodiester ␣-N-acetylglucosaminidase (phosphodiester ␣-GlcNAcase), 1 which removes the covering GlcNAc to expose the mannose 6-phosphate recognition signal on the lysosomal acid hydrolases (1). These lysosomal enzymes can then bind to one of the two mannose 6-phosphate receptors in the trans-Golgi network (TGN) and be transferred to endosomes and subsequently to lysosomes (2,3). The phosphodiester ␣-GlcNAcase plays an important role in lysosomal enzyme targeting because the mannose 6-phosphate receptors do not bind GlcNAc-P-Man. We have studied the kinetics, substrate specificity, and hydrodynamic properties of phosphodiester ␣-GlcNAcase from bovine liver (4,5) and recently purified it over 600,000-fold to homogeneity using a two-step immunoaffinity purification procedure (6). The native membrane-associated enzyme exists as a tetramer (272 kDa) composed of two dimers (136 kDa) each containing a pair of disulfide-linked monomers (68 kDa). The monomers are N-glycosylated with complex type oligosaccharides, and their mobility on reducing SDS-PAGE changes to about 50 kDa after digestion with peptide N-glycosidase F. Interestingly, this presumably cis-Golgiacting enzyme is sialylated to a significant extent (3.8 mol/mol of monomer) indicating that it must travel to the trans-Golgi network where sialyltransferase resides.
In this study we describe the cloning and expression of the cDNA encoding human phosphodiester ␣-GlcNAcase and the evidence that there are two splice forms of the mRNA for the enzyme. In contrast to glycosyltransferases and other glycoprotein-processing enzymes in the Golgi apparatus, which to date are all type II membrane-spanning proteins (7), the phosphodiester ␣-GlcNAcase is composed of identical type I membrane-spanning subunits.

EXPERIMENTAL PROCEDURES
Materials-The enzymes were obtained from the following suppliers: Taq DNA polymerase, EcoRI, and HindIII from Promega; BamHI from New England Biolabs; and T4 DNA ligase from Life Technologies, Inc. Transformation-competent DH5␣ cells were from Life Technologies, Inc. The protease inhibitor mixture (1000ϫ) was prepared by combining antipain, chymostatin, leupeptin, and pepstatin A, all from Sigma, at 1 mg/ml each.
Preparation and Sequencing of Peptides from Pure Bovine Phosphodiester ␣-GlcNAcase-Bovine phosphodiester ␣-GlcNAcase was purified and subjected to amino-terminal sequencing by Edman degradation as described previously (6). In addition, to obtain internal peptide sequence, the pure enzyme was subjected to trypsin digestion in solution, and the digest was fractionated by reverse phase microbore high * This work was supported in part by National Institutes of Health Grant CA08759 and American Heart Association, Oklahoma Affiliate, Grant 970805S. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  pressure liquid chromatography. A number of peptides were obtained, and two peptides yielded single, unique amino acid sequences.
Isolation of cDNAs Encoding Human Phosphodiester ␣-GlcNAcase-The amino acid sequences of the three bovine phosphodiester ␣-GlcNAcase peptides were used to BLAST search the non-redundant Gen-Bank TM EST data base (8), and four clones were found that had sequence homologous to tryptic peptide 3 (see Table I) at their 5Ј ends. These EST clones from the I.M.A.G.E. Consortium (9) were all obtained from ATCC and included three human infant brain clones: I.M.A.G.E. Consortium clone ID 43150 (GB R60451, ATCC 367524), I.M.A.G.E. Consortium clone ID 28009 (GB R13396, ATCC 354452), and I.M-.A.G.E. Consortium clone ID 23953 (GB T77682, ATCC 328360) as well as one mouse embryo clone, I.M.A.G.E. Consortium clone ID 354478 (GB W45769, ATCC 811592). Sequencing of the four cDNAs using the ABI Prism Dye Terminator Cycle Sequencing Ready Reaction mixture (Perkin-Elmer) showed that all three human clones were identical except for small variations in length at the 3Ј and 5Ј ends. The sequence of the mouse embryo clone was homologous to the human infant brain clones except that it contained a 102-base pair insert not present in the latter clones.
The plasmid DNA of the human infant brain clone (I.M.A.G.E. ID 43150) which had an insert of about 1.4 kb was digested with HindIII to liberate a 700-bp fragment from its extreme 5Ј end. The DNA fragment was gel-purified, and 50 ng were labeled with 100 Ci of [␣-32 P]dCTP using the High Prime DNA labeling kit from Roche Molecular Biochemicals according to the manufacturer's instructions. The labeled probe was used to screen a human liver 5Ј-Stretch plus cDNA library (CLONTECH HL 5022t) in the TriplEx-phagemid vector. Eight strongly positive colonies were detected in the primary screen of 10 6 plaque-forming units carried out as described by the manufacturer. In the secondary screen of these 8, strongly positive colonies were seen on 7 plates, and a number of separate colonies from each positive plate were cloned. The clones were converted to pTriplEx plasmids making use of the Cre-Lox recombinase feature of the vector in the host cells Escherichia coli BM 258, supplied by CLONTECH. The plasmid DNAs were subjected to restriction digestion to excise the cloned inserts. The inserts that also contained sequences of the EST varied in size from 1000 to 2200 bp. Clone 6.5, an example of the largest, was taken for the rest of the analysis.
Northern Blot-The BamHI-HindIII fragment of clone 6.5, gel-purified and labeled with [␣-32 P]dCTP as above, was used to probe a human multiple tissue Northern blot (CLONTECH) using the Express Hyb and protocol supplied by the manufacturer.
Isolation of Mouse Phosphodiester ␣-GlcNAcase Genomic DNA-A mouse genomic PI clone that was positive when probed with a 32 Plabeled fragment of clone 6.5 cDNA was obtained from Genome Systems Inc. The clone was sequenced by standard methods in the Molecular Biology Resource Facility of the W. K. Warren Medical Research Institute.
Construction of an Expression Clone for Human Phosphodiester ␣-Gl-cNAcase-Since clone 6.5 was lacking an initiator methionine and 5Јuntranslated region but did contain a predicted signal sequence, PCR was used to generate a 5Ј-terminal fragment of clone 6.5 with a start codon. The 5Ј sense primer contained 14 bp of unmatched sequence encoding an EcoRI site, a Kozak consensus sequence (10), and the Met codon, followed by the first 17 bp of the clone 6.5 sequence, and was 5Ј-GGAATTCCACCATGGCGACCTCCACGGGTCG-3Ј. The 3Ј-antisense primer corresponded to nucleotides 520 -538 downstream of a BamHI site at nucleotide 512 (in the full-length sequence shown in Fig.  1) and was 5Ј-TGACCAGGGTCCCGTCGCG-3Ј. The PCR was performed with Taq polymerase in PCR buffer containing 1.3 M betaine and 1% Me 2 SO using clone 6.5 DNA as template for 25 cycles at an annealing temperature of 45°C. The PCR product of approximately 500 bp was gel-purified using the QIAquick Gel Extraction Kit (Qiagen, Inc.) and digested with EcoRI and BamHI to generate a fragment, also gel-purified, for ligation into the expression vector pcDNA3.1(Ϫ) obtained from Invitrogen. The vector was also digested with EcoRI and BamHI to provide an acceptor site in its multiple cloning site, and after gel purification it was treated with calf intestine alkaline phosphatase (New England Biolabs) and subjected to gel purification. The ligation was performed with T4 DNA ligase, and the reaction mixture was used to transform DH5␣-competent cells that yielded many well separated ampicillin-resistant colonies of which 10 were cloned and all showed a 520-bp insert after EcoRI/BamHI digestion. Four of the clones were sequenced using sense and antisense primers in the pcDNA3.1(Ϫ) vector. All four were identical both to each other and to clone 6.5 except for the added EcoRI-Kozak-ATG sequence, which was intact and in-frame with the first codon of clone 6.5. By similar procedures, the BamHI/ HindIII fragment of clone 6.5 was ligated into this newly constructed expression vector between the BamHI site and the downstream HindIII site in the multiple cloning site. A number of clones were isolated, and all contained an EcoRI/HindIII insert of approximately 1600 bp as expected.
Construction of Phosphodiester ␣-GlcNAcase Mutants for Expression-Mutations were inserted in the human phosphodiester ␣-GlcNAcase expression clone by constructing mutants in the parental clone 6.5 in TriplEx between the BamHI and HindIII sites, verifying the mutation via sequencing and then performing a restriction digest followed by insertion of the purified mutant BamHI-HindIII fragment into the new expression vector. A deletion variant missing the 102-base pair insert not found in the human brain EST was constructed using the gene splicing by overlap extension (SOEing) method of Horton (11). In the case of human phosphodiester ␣-GlcNAcase, "gene 1" is clone 6.5 from slightly before the BamHI site through nucleotide 1174, and "gene 2" is from nucleotide 1277 to slightly past the HindIII site. The sense primer a for gene 1 was 5Ј-GCTGCAGAACGCGCAGTTCG-3Ј, and the antisense primer b, which overlapped for 18 residues the sense primer c for gene 2, was (5Ј to 3Ј) GGTTGACGTCACTTCATTTCGTCACAGAGGT-CG. For gene 2 the sense primer c was 5Ј-TAAAGCAGTGTCTCCAGC-3Ј, and the antisense primer d (5Ј to 3Ј) was GAGGACACCCAGATGG-TC. Two PCR reactions were run using clone 6.5 as template. The products AB and CD were gel-purified, and these overlapping doublestranded fragments were used as the template for another PCR reaction primed with primer a and primer d to make product AD containing the 102-bp deletion. Following digestion with BamHI and HindIII the ⌬102-bp variant was inserted into the expression vector pcDNA 3.1(Ϫ) as above. A mutant truncated just 5Ј of the transmembrane domain (W450stop) was constructed using the Quik Change TM Site-directed Mutagenesis kit of Stratagene and a pair of priming nucleotides encompassing the desired W450stop:sense strand primer CCAGGACCGCCT-GACTAGCCCTCACCCTGGCG and antisense strand primer (5Ј-3Ј) CGCCAGGGTGAGGGCTAGTCAGGCGGTCCTGG.
Transfection of COS Cells and Assay for Phosphodiester ␣-GlcNAcase Expression-COS cells were grown in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum (37°C, 5% CO 2 ) in 60-mm plates until they reached 50 -80% confluence. The plates were then washed with serum-free Opti-MEM I and transfected with 2 g of plasmid DNA per plate using the LipofectAMINE Plus reagent (Life Technologies, Inc.) and the protocol provided by the manufacturer. After 3 h the plates were supplemented with additional Opti-MEM I and fetal calf serum to bring the volume to 3 ml and 10% fetal calf serum. At 24 h the medium was changed to fresh Dulbecco's modified Eagle's medium, 10% fetal calf serum, or in the case where the medium was to be sampled, to fresh Opti-MEM I with no serum. Cells and medium were harvested after an additional 24 h in the latter case, but cells in serum were harvested after another 48 h. The cells were scraped up and washed twice in Tris-buffered saline, resuspended in 500 l of 10 mM Tris, pH 7.4, containing a protease inhibitor mixture (1 g/ml), and subjected to two 10-s bursts on a Fisher sonicator. The sonicates were centrifuged in the Beckman Optima TL tabletop ultracentrifuge at 70,000 rpm for 30 min. The supernatant was removed and saved, and the membrane pellet was resuspended in 50 mM Tris, pH 7.4, 1 mM MgCl 2 , 1 mM CaCl 2 , 1.5% Lubrol and sonicated as above and centrifuged in the Hermle Z323K at 17,000 rpm (30,000 ϫ g) for 5 min. The supernatant fraction contained the solubilized membranes. When the medium was sampled, it (3 ml) was removed from the plate and added to a tube containing protease inhibitors before the cells were worked up. The medium was then concentrated in a Centricon 30 apparatus before assay. All the fractions were assayed for phosphodiester ␣-GlcNAc activity and protein content.
Protein Determination-Protein concentration was measured using the Micro BCA assay (Pierce) standardized with bovine serum albumin (13).

SDS-Polyacrylamide Gel Electrophoresis and Western
Blotting-Samples were subjected to reducing SDS-PAGE in 7.5% gels in Tris glycine buffer and transferred to nitrocellulose. The blots were probed with affinity purified rabbit anti-peptide antibody (1:1200 dilution) (6) and detected with goat anti-rabbit IgG (Pierce) (1:5000 dilution) and ECL reagents (Amersham Pharmacia Biotech). The intensity of the bands on the Western blot was quantitated by laser densitometry using the Molecular Dynamics Personal Densitometer ImageQuant system.

Cloning of the cDNA Encoding Human Phosphodiester ␣-
GlcNAcase-Affinity purified, homogeneous bovine phosphodiester ␣-GlcNAcase was subjected to amino-terminal amino acid sequencing as described previously (6). The pure enzyme was also subjected to trypsin digestion and high pressure liquid chromatography to generate two internal tryptic peptides which were sequenced. The amino acid sequences of these three peptides are shown in Table I. The protein, nucleotide, and EST data bases were searched for sequences that matched these peptide sequences, and several human and mouse ESTs were found that had the sequence of peptide 3 at their amino termini. Three human infant brain EST clones and one mouse embryo clone were obtained from ATCC and sequenced by us. The three human clones were all identical except for total length at their 3Ј ends and virtually identical to the mouse clone, except that the mouse EST contained a 102-bp region that was absent from all three human brain ESTs. An EcoRI-HindIII fragment of about 700 bp was excised from the human cDNA clone (ATCC 367524) and used to probe a human liver cDNA library directionally cloned in TriplEx vector (CLON-TECH). Of the positive clones isolated from the library and converted to plasmids (pTriplEx), the largest (2200 bp) was represented by clone 6.5 which was used for the rest of the analysis. The nucleotide sequence of clone 6.5 cDNA encoding phosphodiester ␣-GlcNAcase and the corresponding deduced amino acid sequence are shown in Fig. 1 which also shows, in italics, the 5Ј-untranslated sequence and Met encoding ATG derived from the recently deposited human genomic DNA sequence. The cDNA clone has been completely sequenced on both strands and is a novel sequence that predicts a mature protein of about 50 kDa which is in agreement with the size of the deglycosylated mature bovine liver phosphodiester ␣-Glc-NAcase. There is a unique BamHI site at base 512 and a unique HindIII site at base 1581. The schematic diagram of the amino acid sequence and the Kyte-Doolittle hydrophilicity plot shown in Fig. 2 highlight some of the features of the phosphodiester ␣-GlcNAcase structure. All three bovine peptide sequences (1, 2, and 3) were found. Although the sequences of peptides 2 and 3 in the human are 100% identical to the bovine sequences, the amino-terminal peptide in human is only 67% identical to the bovine sequence. The human liver clone contains the 102-base pair insert that has the characteristics of an alternatively spliced segment that was missing in the human brain EST. The hydrophilicity plot indicates the presence of a hydrophobic membrane-spanning region from amino acids 448 to 474 and another hydrophobic region from amino acid 8 to 26 which fits the motif for a signal sequence, and there is a likely signal sequence cleavage site between Gly-25 and Leu-26 (14). There are six Asn-X-Ser/Thr potential N-linked glycosylation sites, one of which is within the 102-bp insert. All of these sites are amino-terminal of the putative trans-membrane region. These features indicate that the phosphodiester ␣-GlcNAcase is a type I membrane-spanning glycoprotein with the amino terminus in the lumen of the Golgi and the carboxyl terminus in the cytosol. This orientation is different from that of other glycosyltransferases and glycosidases involved in glycoprotein processing, which to date have been shown to be type II membrane-spanning proteins (7). Interestingly, there is a potential  tyrosine-based internalization signal (Y 488 HPL 491 ) in the cytoplasmic tail of the phosphodiester ␣-GlcNAcase sequence that suggests that the enzyme may travel to the TGN (where it becomes sialylated) or even to the plasma membrane before being retrieved to its site of action in the cis/medial-Golgi. A second potential retrieval signal in the phosphodiester ␣-Glc-NAcase sequence is the carboxyl-terminal NPFKD. In yeast the sequence NPFXD has been shown to act as an endocytosis signal (15), and more recently (16 -20) peptides and proteins containing the NPF motif have been shown to interact with the Eps15 homology domain (EH domain). The cDNA of clone 6.5 shown is missing the initiation codon and 5Ј-untranslated region.
Size of the Human mRNA and Organization of the Mouse and Human Genes for Phosphodiester ␣-GlcNAcase-The human cDNA clone 6.5 is about 2.2 kb in size, and a fragment of it was labeled with 32 P and used to probe a human multiple tissue Northern blot (CLONTECH) as shown in Fig. 3. An mRNA of approximately 2.4 kb was detected for all tissues except brain where the band was somewhat smaller (2.3 kb), consistent with the fact that the ESTs isolated from human brain are missing a 102-bp segment present in the human liver cDNA of clone 6.5. In addition, another fainter mRNA band at about 3.5 kb is present in the liver.
A mouse genomic Pl clone was obtained that was positive when probed with 32 P-labeled DNA fragments of the human clone 6.5 cDNA. Sequencing of this Pl clone revealed the intron/ exon organization of most of the mouse gene, which is presented schematically in Fig. 4. The intron/exon borders were identified, and they are presented in Table II. The Pl clone was 9.8 kb in length and contained a large intron at the 5Ј end followed by what is designated as exon 2 which encodes the mouse equivalent of amino acids 30 -179 in the human phosphodiester ␣-GlcNAcase. Thus this genomic Pl clone is missing at least one 5Ј exon that encodes the first 29 amino acids or more. The 102-base pair insert that is missing from the human brain ESTs exactly comprises exon 8, indicating that those mRNAs are the result of alternative splicing. Exon 10 encodes the trans-membrane domain and the cytoplasmic tail through the stop codon, followed by a 3Ј-untranslated region of at least 535 bp which occurs at the 3Ј end of the Pl clone. While this manuscript was in preparation, a routine search of the nonredundant data base with the human cDNA sequence revealed that a recently deposited 160 kb of human genomic sequence (AC007011 from Los Alamos Laboratory) contained a 10-kb segment corresponding to the genomic sequence of the human phosphodiester ␣-GlcNAcase. The intron/exon organization of the human gene is presented schematically in Fig. 4 for comparison to the mouse gene. Similarly, the intron/exon borders are also presented in Table II. The exon sizes are the same for mouse and human, and the human exon 1 (missing in the mouse genomic clone) encodes the 5Ј-untranslated region and the initiator methionine which are absent in the human cDNA clone. The methionine is followed by the alanine which is the first amino acid encoded by the cDNA clone. Fig. 5 shows a comparison of the mouse phosphodiester ␣-GlcNAcase amino acid sequence to that of the human sequence. The two sequences are 80% identical overall as shown in the boxed areas. The identity in the 41 amino acids of the cytosolic tail is less (73%), but the tetrapeptide potential internalization signal YHPL is completely conserved as is the COOH-terminal pentapeptide NPFKD, suggesting it may play a role in the trafficking of phosphodiester ␣-GlcNAcase.
Expression of a Modified Human Clone 6.5 cDNA in COS Cells Induced High Levels of Phosphodiester ␣-GlcNAcase Activity-As noted above, we have been unsuccessful in obtaining a full-length cDNA encoding the 5Ј-untranslated region and initiator methionine of human phosphodiester ␣-GlcNAcase despite trying a number of strategies to extend clone 6.5 to the 5Ј end. Similarly the mouse Pl clone is missing the 5Ј end of the mouse gene. However, since clone 6.5 encoded a very good signal sequence and signal peptide cleavage site, we reasoned that it may be missing only a few amino acids including an initiator methionine. Accordingly we constructed a modified 5Ј end on clone 6.5 by adding an EcoRI site, a Kozak consensus sequence, and ATG to encode methionine in frame just before the first nucleotide of clone 6.5 and inserted the construct into the expression vector pcDNA3.1(Ϫ) (Invitrogen) as described under "Experimental Procedures." Given the sequence we know from the human genomic DNA, it is now evident that our construct, in fact, encodes the entire phosphodiester ␣-GlcNAcase sequence for expression. A deletion mutant of clone 6.5 missing the 102-bp insert (⌬102-bp variant) was also constructed and put into pcDNA3.1(Ϫ). When these expression plasmids were transfected into COS cells and the cells were harvested 48 h after replacing the transfecting DNA with full medium, the results shown in Table III were obtained. Duplicate plates of cells expressing the full-length and the ⌬102-bp variant as well as mock-transfected cells were lysed by sonication and subjected to high speed centrifugation to sediment a membrane pellet that was separated from the supernatant cytosol. The membranes were solubilized in detergent, and both fractions were assayed for phosphodiester ␣-GlcNAcase activity. Between 70 and 75% of the enzyme activity in the transfected cells was in the membrane pellet, and 78% of the endogenous COS cell phosphodiester ␣-GlcNAcase was in the membrane pellet. Table III shows that the solubilized mem- branes of COS cells have a level of endogenous phosphodiester ␣-GlcNAcase activity (21.5 nmol/h/mg protein) that is 4 times higher than that of bovine liver membranes (5.2 nmol/h/mg) (6). However, transfection with the full-length human liver cDNA resulted in an average increase of 50-fold in enzyme-specific activity, and transfection with the ⌬102-bp variant cDNA caused an average increase of 90-fold in phosphodiester ␣-Gl-cNAcase activity. This result shows that the 102-bp region is dispensable for enzymatic activity.
An aliquot of each solubilized membrane extract was subjected to reducing SDS-PAGE, and the gel was blotted onto nitrocellulose that was probed with an antibody raised against a peptide in the amino terminus of the bovine liver phosphodiester ␣-GlcNAcase. The Western blot shown in Fig. 6 reveals several things about the human liver enzyme expressed in COS cells as follows: 1) the full-length protein has a molecular weight of about 77,000 which is consistent with a polypeptide of 490 amino acids (after cleavage of the signal peptide) bearing 6 N-linked oligosaccharides and is similar in size to the bovine liver enzyme on SDS-PAGE (68 -72 kDa); 2) the ⌬102-bp variant has a molecular weight of about 69,000 (about 8000 smaller than the full-length) consistent with its missing 102 bp or about 3700 molecular weight of peptide and a single Nlinked oligosaccharide; and 3) both human proteins cross-react with the antipeptide antibody despite the fact that the peptide to which it was raised (amino acids 3-15 of the bovine aminoterminal peptide in Table I) differs in three of the 13 amino acids from the human sequence. Endogenous COS cell phosphodiester ␣-GlcNAcase, in contrast, does not cross-react with the antibody. In an effort to evaluate whether the ⌬102-bp variant protein really had a higher intrinsic enzymatic activity than the full-length protein (as opposed to being expressed at higher copy number per cell), we quantitated the amount of antigen protein in the blots in Fig. 6 using laser densitometry to integrate the volume of each band on the blot. These values appear in the third column in Table III and show that the ⌬102-bp variant protein has, on average, about one-half the phosphodiester ␣-GlcNAcase activity per antigen unit as the full-length protein. Both membrane extracts were analyzed for their K m values for the artificial substrate [ 3 H]GlcNAc-phosphomannose ␣-methyl and gave values (full-length, 0.4 mM and ⌬102-bp variant, 0.43 mM) comparable with that of the pure bovine liver enzyme (0.49 mM) (6). The phosphodiester ␣-Glc-NAcase activity in both membrane extracts showed a broad pH optimum between pH 5 and pH 7, again comparable to the bovine enzyme. The full-length protein eluted from a Superose 6 gel filtration column in the same position as the purified bovine liver enzyme, indicating that the expressed human enzyme is a homotetramer (data not shown).
A truncated form of the modified human cDNA was con-  structed by inserting a stop codon (W450stop) just before the transmembrane domain. When this construct was transfected into COS cells and compared with mock-transfected cells and cells transfected with full-length phosphodiester ␣-GlcNAcase cDNA, the truncated phosphodiester ␣-GlcNAcase was found secreted in the medium. In the experiment shown in Table IV, the COS cells were incubated in serum-free medium for 24 h following removal of the transfecting cDNA, and the medium and cells were harvested for enzyme assay. The endogenous COS cell phosphodiester ␣-GlcNAcase activity is predominantly (92%) found in the cell membrane fraction as is that encoded by the full-length human cDNA (67%), but the truncated human phosphodiester ␣-GlcNAcase was predominantly secreted into the medium (75%).

DISCUSSION
In these studies we have isolated the cDNA that encodes human phosphodiester ␣-GlcNAcase, which catalyzes the second step in the formation of the mannose 6-phosphate recognition signal on lysosomal enzymes. Although our cDNA clone is missing the initiator methionine and 5Ј-untranslated region, the human phosphodiester ␣-GlcNAcase genomic sequence, recently deposited in the data base as part of a large cosmid sequence, has provided this missing sequence. This human genomic DNA was derived from human chromosome 16, and interestingly the gene for the ␥ subunit of GlcNAc phosphotransferase also is on chromosome 16. 2 In parallel we have sequenced a P1 clone of mouse genomic DNA that encodes most of the sequence of mouse phosphodiester ␣-GlcNAcase. The genomic organization of the mouse and human genes is the same, with both containing a single exon (exon 8) of 102 bp that is missing in human infant brain ESTs. This splice variant may be brain-specific since the Northern blot of mRNA from various human tissues revealed that brain mRNA from phosphodiester ␣-GlcNAcase is about 2.3 kb in size in contrast to the 2.4-kb mRNA of other tissues. The amino acid sequences encoded by the human and mouse phosphodiester ␣-GlcNAcase cDNAs are 80% identical.
The nucleotide sequence of the coding portion of the human genomic DNA for phosphodiester ␣-GlcNAcase is identical to that of the human liver clone 6.5 cDNA except for a single base pair (bp 1394) in the full-length sequence, which is T in clone 6.5 and C in the genomic. This changes the Ile residue at amino acid 465 in clone 6.5 to a Thr residue in the genomic sequence. Interestingly, the mouse sequence encodes the Thr residue as does a human heart EST cDNA (clone 3NHCO336), whereas the human infant brain ESTs encode the Ile residue. These results suggest that the occurrence of T or C at base pair 1394 may represent a polymorphism and not sequencing errors.
Phosphodiester ␣-GlcNAcase acts in the Golgi (21) and the bovine enzyme is a membrane-spanning tetramer composed of two disulfide linked dimers containing 68-kDa N-glycosylated monomers (6). The structural features revealed by the amino acid sequence show that the human phosphodiester ␣-GlcNAcase contains a hydrophobic signal sequence and signal peptide cleavage site at the NH 2 terminus and another hydrophobic transmembrane region near the COOH terminus. This indi-2 W. Canfield, unpublished observations.  a Antigen unit is an arbitrary number derived from densitometry of the bands on the Western blot in Fig. 6.  Table III were subjected to SDS-PAGE, blotted to nitrocellulose, and probed with anti-peptide antibody to the bovine enzyme. cates that the enzyme is a type I membrane-spanning protein with the NH 2 terminus in the lumen of the Golgi and the COOH terminus in the cytosol. This orientation is opposite to that of other cloned glycosyltransferases and glycosidases of the oligosaccharide processing pathway which are type II membrane-spanning proteins (7). There are six potential N-linked oligosaccharide sites in the lumenal domain and, as revealed by a ProfileScan, a carboxyl-terminal cystine knot profile (22) (amino acids 307-390) including an epidermal growth factorlike domain (amino acids 362-389) occurs just prior to the 102-bp insert which also encodes a very cysteine-rich domain.
These features indicate that the disulfide-bonded dimers of phosphodiester ␣-GlcNAcase are probably stabilized by a number of S-S bridges. This conclusion is also supported by the fact that on reducing SDS-PAGE the purified bovine liver enzyme showed some dimer band even after boiling for 5 min in 5% ␤-mercaptoethanol/SDS sample buffer (6). It is very interesting that the carboxyl-terminal cytosolic tail contains a potential tyrosine-based internalization signal YHPL that fits the consensus YXX ( is a bulky hydrophobic amino acid) first described by Canfield et al. (23) for the mannose 6-phosphate receptor and subsequently found for a number of other receptors that undergo endocytosis in coated pits from the plasma membrane prior to entry into the endosomal compartment (24). The trans-Golgi membrane marker protein TGN 38 (rodent) (25) and human TGN 46 (26) also contain such a tyrosine-based signal (YQRL) which is essential for their retrieval from the plasma membrane to the TGN. This raises the intriguing possibility that phosphodiester ␣-GlcNAcase travels to the plasma membrane and is retrieved to the Golgi apparatus. Such a traffic pattern is unusual for a Golgi-processing enzyme, but we already have evidence that the bovine enzyme is sialylated (6), a modification believed to occur solely in the TGN. The human phosphodiester ␣-GlcNAcase contains another potential endocytosis signal, the NPFKD sequence at its COOH terminus, which may also play a role in its intracellular trafficking. Tan et al. (15) have shown that the yeast type I integral membrane protein Kex2p, which resides in a late Golgi compartment, contains the endocytosis signal NPFXD in its cytoplasmic tail. When fused to a truncated form of the ␣-factor receptor Ste2p, the cytoplasmic tail of Kex2p mediated ␣-factor endocytosis that was dependent on the sequence NPFXD as demonstrated by alanine-scanning mutagenesis of the sequence. The endocytosis motif was active in both its normal internal location as well as at the COOH terminus of the cytoplasmic tail. Subsequently, Salcini et al. (16) showed that the EH (Eps15 homology) domain involved in proteinprotein interactions binds in vitro to peptides containing an NPF motif. They also isolated a number of proteins that interacted with EH domains and found that all contained NPF motifs responsible for the binding. The direct interaction of the NPF motif with a binding pocket in the EH 2 domain of Eps15 was shown by de Beer et al. (17) who solved the three-dimensional structure of the EH 2 domain using heteronuclear magnetic resonance spectroscopy. Others have examined both the interaction of specific NPF-containing proteins with EH domains (18) and the peptide recognition specificity of EH domains from a variety of proteins (19). Most recently Yamabhai et al. (20) have isolated a new adaptor protein they named intersectin because it contains two EH domains and five SH3 domains and thus can potentially bring together EH and SH3 domain-binding proteins in a macromolecular complex that is part of the endocytic machinery. Both EH domains of intersectin were shown to interact with NPF-containing peptides as well as the mouse RAB protein which contains four NPF motifs including a COOH-terminal NPFL. By using glutathione S-transferase fusion proteins of RAB and RAB without the COOH-terminal NPFL, they showed that only the former fusion protein interacted with the intersectin EHa domain, indicating that in this case only the NPF motif at the COOH terminus can bind the EH domain. Furthermore, GST-TNPFL and GST-TNPFLA could bind the EH domain but GST-TNPF-LAA could not, further emphasizing the importance of the carboxylate group in the interaction. The COOH-terminal NPFKD sequence, conserved between human and mouse phosphodiester ␣-GlcNAcase, is therefore an attractive candidate for either an endocytosis signal acting at the plasma membrane or a retrieval signal acting at the TGN to return the enzyme to the cis/medial-Golgi. One could imagine it interacting with EH-containing cytosolic proteins at either site which could initiate formation of intracellular trafficking vesicles.
When we inserted the complete coding sequence of human phosphodiester ␣-GlcNAcase in an expression vector (pcDNA 3.1(Ϫ)h P␣-G) and expressed this construct in COS cells, the membrane extracts from the cells expressed over 50 times the endogenous level of enzyme activity. The plasmid DNA encoding the full-length protein produced an enzyme that had the appropriate mobility on SDS-PAGE (77 kDa) for a 490-amino acid mature protein with 6 N-linked oligosaccharides. This was determined by Western blotting with an antibody raised to a 13-amino acid sequence from the amino-terminal peptide 1 of the purified bovine phosphodiester ␣-GlcNAcase as described previously (6). It should be pointed out that the sequence of bovine peptide 1 starts 24 amino acids downstream of the signal peptide cleavage site (following an arginine residue) in the human enzyme sequence. Thus the mature bovine liver enzyme as isolated had undergone an additional proteolytic clip that may account for the fact that on SDS-PAGE it had a mobility corresponding to 68 -72 kDa. We also expressed two mutant constructs, one missing the 102-bp insert (⌬102-bp variant) and the other missing the transmembrane and cytosolic tail domains. In both cases, good expression and high enzyme activity were obtained, and the distribution of the activity indicated the following: 1) that the 102-bp insert is not required for activity or retention in the cell membrane, and 2) that the transmembrane and cytosolic tail are not required for activity but are required for retention in the membrane since that truncated enzyme was recovered in the medium. A number of membrane-spanning enzymes of the Golgi require their transmembrane domains for retention, but there are some in which the luminal stem region plays a role in Golgi retention (see Ref. 7). Our finding with the truncated expressed human phosphodiester ␣-GlcNAcase was not surprising since Lee and Pierce (27) have described a soluble form of the enzyme in human serum.
Our future studies will focus on analyzing the role of the various domains of expressed human phosphodiester ␣-GlcNAcase on the intracellular trafficking of the enzyme. We will also explore what role the alternatively spliced 102-bp exon 8 plays in the phosphodiester ␣-GlcNAcase structure and function.