The alpha- and beta-subunits of the human UDP-N-acetylglucosamine:lysosomal enzyme N-acetylglucosamine-1-phosphotransferase [corrected] are encoded by a single cDNA.

Lysosomal enzymes are targeted to the lysosome through binding to mannose 6-phosphate receptors because their glycans are modified with mannose 6-phosphate. This modification is catalyzed by UDP-N-acetylglucosamine:lysosomal enzyme N-acetylglucosamine-1-phosphotransferase (GlcNAc-phosphotransferase). Bovine GlcNAc-phosphotransferase was isolated using monoclonal antibody affinity chromatography, and an alpha2beta2gamma2-subunit structure was proposed. Although cDNA encoding the gamma-subunit has been described, cDNAs for the alpha- and beta-subunits have not. Using partial amino acid sequences from the bovine alpha- and beta-subunits, we have isolated a human cDNA that encodes both the alpha- and beta-subunits. Both subunits contain a single predicted membrane-spanning domain. The alpha- and beta-subunits appear to be generated by a proteolytic cleavage at the Lys928-Asp929 bond. Transfection of 293T cells with the alpha/beta-subunits-precursor cDNA with or without the gamma-subunit cDNA results in a 3.6- or 17-fold increase in GlcNAc-phosphotransferase activity in cell lysates, suggesting that the precursor cDNA contains the catalytic domain. The sequence lacks significant similarity with any described vertebrate enzyme except for two Notch-like repeats in the alpha-subunit. However, a 112-amino acid sequence is highly similar to a group of bacterial capsular polymerases (46% identity). A BAC clone containing the gene that spanned 85.3 kb and was composed of 21 exons was sequenced and localized to chromosome 12q23. We now report the cloning of both the cDNA and genomic DNA of the precursor of Glc-NAc-phosphotransferase. The completion of cloning all three subunits of GlcNAc-phosphotransferase allows expression of recombinant enzyme and dissection of lysosomal targeting disorders.

In higher eukaryotes most lysosomal hydrolases are targeted to the lysosome via a mannose 6-phosphate (M6P) 2 -dependent pathway.
Before targeting, lysosomal enzymes are modified by the addition of M6P in a two-step reaction. In the first step UDP-N-acetylglucosaminelysosomal enzyme phosphotransferase (GlcNAc-phosphotransferase; EC 2.7.8.17) catalyzes the transfer of GlcNAc 1-phosphate from UDP-GlcNAc to the terminal or penultimate mannose on high mannose-type glycans of lysosomal hydrolases (1)(2)(3). The second enzymatic step occurs in the trans-Golgi network, where the covering GlcNAc is removed by N-acetylglucosamine-1-phosphodiester ␣-N-acetylglucosaminidase (EC 3.1.4.45), which has the trivial name "uncovering enzyme" (4,5). The lysosomal enzymes, now modified with M6P, bind to M6P receptors in the trans-Golgi network and are translocated to the endosome and subsequently to the lysosome. Recognition of lysosomal hydrolases by GlcNAc-phosphotransferase is the initiating step in lysosomal hydrolase trafficking. Lysosomal hydrolases that are known substrates for GlcNAc-phosphotransferase exhibit low K m values, whereas non-lysosomal glycoproteins bearing similar glycans have higher K m values (1,3,6).
GlcNAc-phosphotransferase was isolated using monoclonal antibody affinity chromatography from lactating bovine mammary glands and found to contain three distinct polypeptides. Based on the presence of disulfide-linked ␣ 2 and ␥ 2 dimers and quantitative protein sequencing, an ␣ 2 ␤ 2 ␥ 2 -subunit structure with a molecular mass of 540 kDa was proposed (6). cDNA and gene encoding the ␥-subunit have been isolated and characterized (7), but the cDNA(s) or gene(s) for the ␣and ␤subunits has not been reported.
Lysosomal storage diseases are caused by a genetic deficiency of one or more lysosomal enzymes. Enzyme replacement therapy, the therapeutic administration of the missing lysosomal enzyme, is a proven or potential treatment strategy for many lysosomal storage diseases (8,9). The cation-independent M6P receptor is widely expressed on the surface of many cells and can be used for efficient receptor-mediated endocytosis of therapeutic enzymes (10 -12). However, the enzyme must be modified with sufficient M6P to bind to the receptor with the appropriate affinity. Access to the cDNA(s) for the ␣and ␤-subunits may allow reconstitution of phosphorylation in vitro as well as other strategies to improve the quality of lysosomal enzyme therapeutics.
GlcNAc-phosphotransferase is absent or unregulated in a group of diseases of lysosomal targeting termed mucolipidosis (13)(14)(15)(16)(17). GlcNAcphosphotransferase activity is absent in mucolipidosis II (ML II; MIM 252500, I-cell disease), it is reduced in ML IIIA (MIM 252600; pseudo-Hurler polydystrophy), and in ML IIIC (MIM 252605, mucolipidosis III, variant form) it is unregulated. Mutations in the ␥-subunit gene have been demonstrated to be the molecular basis of MLIIIC (7,18). Because cells from ML IIIC patients retain GlcNAc-phosphotransferase activity, whereas the substrate recognition is unregulated, the catalytic domain of the enzyme is predicted to reside on ␣and/or ␤-subunits of the enzyme. Cloning of the human ␣and ␤-subunit gene will allow us to understand the genetic basis of ML II and ML IIIA.
In this paper we describe the cloning of a single cDNA and gene that encodes a precursor of the ␣and ␤-subunits of human GlcNAc-phosphotransferase (gene symbol, GNPTAB). Transfection of the ␣/␤-subunits precursor cDNA with or without co-transfection of the regulatory ␥-subunitcDNA resulted in an increase in GlcNAc-phosphotransferase activity on low molecular weight substrate. This demonstrates that the ␣/␤-subunits precursor cDNA encodes the catalytic domain of the enzyme.

EXPERIMENTAL PROCEDURES
Oligonucleotides, cDNA Libraries, Clones, Plasmid Vectors-Oligonucleotide primers were synthesized at the Molecular Biology Resource Facility at University Oklahoma Health Sciences Center. Mouse liver Marathon Ready TM cDNA and human whole brain Marathon-Ready TM cDNA were purchased from Clontech (Palo Alto, CA). A human placental cDNA library in gt10 (catalog 77399) was purchased from ATCC (Manassas, VA). SuperScript TM human brain cDNA library was purchased from Invitrogen. pCR2.1, pUC19, pcDNA 3.1(Ϫ), and pcDNA6/V5/His-A were purchased from Invitrogen.
Amino Terminal Sequencing of Protein and Peptides-Bovine Glc-NAc-phosphotransferase was isolated as previously described (19) and subjected to non-reducing SDS-PAGE (20). The gel was either electroblotted onto a polyvinylidene fluoride membrane (21) or subjected to in-gel Lys-C digestion. The protein bands on the polyvinylidene fluoride membrane was stained for 1 min with 0.1% Coomassie Blue in 10% acetic acid, 50% methanol, and excised. Protein bands in the gel were stained, excised, and subjected to in-gel reduction, S-Carboxyamidomethylation, and Lys-C digestion in 0.1 M Tris-HCl, pH 8.0, at 37°C overnight. The peptides were resolved by reverse phase chromatography on a C18 column equilibrated with 0.1% trifluoroacetic acid and developed with a linear gradient in acetonitrile. Individual peaks were examined by matrix-assisted laser desorption/Ionization-mass spectroscopy to identify a peak containing a single mass. Strategies for peak selection, reverse phase selection, and Edman microsequencing has been described (22). Proteins on the polyvinylidene fluoride membrane and Lys-C-derived peptides were subjected to automated Edman degradation on an Applied Biosystems, Inc. (Foster City, CA) model 492 protein sequencer at the Molecular Resource Facility in the University of Oklahoma Health Sciences Center.
3Ј-RACE for the Mouse ␤-subunit-A primer 5Ј-gaccgatgaaacaaaaggcaacc-3Ј that spans 441-463 nucleotides upstream from the first amino acid of the mouse ␤-subunit in a mouse cDNA clone (Gen-Bank TM accession number L36434) was used to amplify the 3Ј-end of the cDNA from mouse liver Marathon-Ready cDNA by nested 3Ј-RACE. Primers AP1 5Ј-ccatcctaatacgactcactatagggc-3Ј and AP2 5Ј-actcactatagggctcgagcggc-3Ј were used as reverse primers for the first and second PCR, respectively. The cycling parameters for the first PCR were 30 cycles of denaturation at 94°C for 30 s and annealing/extension at 68°C for 4 min. The second PCR used the same parameters for 20 cycles. A ϳ3.2-kb product was gel-purified, subcloned into pCR2.1, and sequenced on both strands.
Screening of Human Placental cDNA Library and 3Ј-RACE for Human GlcNAc-phosphotransferase cDNA-A 1.1-kb fragment was amplified from the partial mouse ␤-subunit cDNA by PCR using a forward primer 5Ј-agtttggttagcccagtgacacc-3Ј and a reverse primer 5Ј-aaatcctgcaggttaaaggtaggtcgtg-3Ј and used to screen a size-selected human placental cDNA library in gt10 under reduced stringency (55°C, 2ϫ SSC (1ϫ SSC ϭ 0.15 M NaCl and 0.015 M sodium citrate)). Inserts from positive clones were subcloned into pCR2.1 using gt10 LD-Insert Screening Amplimer Set (Clontech, Palo Alto, CA) and sequenced. The remaining portion of the cDNA was cloned by a combination of a walking strategy and RACE. The 3Ј-end of the cDNA was amplified from human whole brain Marathon-Ready TM cDNA by nested 3Ј-RACE, first using primers 5Ј-agtttggttagcccagtgacacc-3Ј and AP1 5Ј-ccatcctaatacgactcactatagggc-3Ј, then using primers 5Ј-cccaaagaaaaacgcttcccgaag-3Ј and AP2 5Ј-actcactatagggctcgagcggc-3Ј. The cycling parameters for the first PCR were 30 cycles of denaturation at 94°C for 30 s and annealing/extension at 68°C for 4 min. The second PCR used the same parameters for 20 cycles. A 3.4-kb product was gel-purified, subcloned into pCR2.1, and sequenced on both strands.
Construction of the Full-length cDNA Encoding Human GlcNAcphosphotransferase ␣/␤-subunits Precursor-A full-length human precursor cDNA was constructed by splicing three pieces of cDNA, the 1118-nucleotide fragment of I.M.A.G.E. Consortium clone 682681 (GenBank TM accession number AA204698), a 1703-nucleotide fragment from the placental clone, and a 2698-nucleotide fragment from the 3Ј-RACE product by ligation at two BbsI restriction sites and subcloning into pUC19. The full-length cDNA (5.6 kb) was excised and subcloned into the NotI site of pcDNA 3.1(Ϫ). Instability of a 5-adenosine repeat (nucleotides 2925-2929) was identified as an occasional problem that results in deletion of 1 adenosine, introducing a frameshift. This problem was prevented by replacing a 215-bp nucleotide fragment between the MfeI and DraIII sites with an MfeI-and DraIIIcleaved PCR product. The product was amplified by using the cDNA as a template with a forward superprimer 5Ј-gaagacacaattggcatacttcactgatagcaagaatactgggaggcaactaaaagatac-3Ј and a reverse primer 5Ј-actgcatatcctcagaatgg-3Ј. The sequence aaaaa was replaced by aagaa (underlined in the forward primer sequence) that does not alter the amino acid sequence.
Construction of Expression Plasmids for the Precursor-To generate expression plasmids for the precursor, the following modifications were made. The 3Ј-untranslated region was removed by further subcloning an ϳ3.8-kb NheI-BglII fragment from the cDNA between NheI and BamHI sites of pcDNA6/V5/His-A. The sequence around the initiation codon was modified in an attempt to improve expression. A ϳ1.1-kb fragment between the NheI site in the polylinker and the XhoI site in the cDNA was replaced by NheI-and XhoI-cleaved PCR products amplified using various forward primers described below and a reverse primer, 5Ј-ctaaaggtaggcaagtggctc-3Ј. Forward primers were 5Ј-cgtggcgggctagccaccatgxyzttcaagctcctgcagagaca-3Ј, where xyz is ctg, ggg, gcg, and gtg, generating a second amino acid of Leu, Gly, Ala, and Val, respectively.
Screening of a Human Genomic BAC Library-A 107-bp fragment was amplified by PCR from a SuperScript TM human brain cDNA library using primers 5Ј-tgcagagacagacctatacctgcc-3Ј and 5Ј-actcacctctccgaactggaaag-3Ј. The cycling parameters were 35 cycles of denaturation at 94°C for 1 min, annealing at 55°C for 1 min, and extension at 79°C for 1 min. The product was used as a probe for screening a human genomic BAC library by Genome Systems (St. Louis, MO). Four genomic clones were identified, and BAC 14951 was isolated and sequenced as described previously (23)(24)(25). Briefly, fragments in the 1-3-kbp range were randomly sheared from the purified BAC DNA (50 g), blunt-ended, and subcloned into SmaI site of pUC18. A random library was generated by transforming Escherichia coli strain XL1BlueMRFЈ (Stratagene, La Jolla, CA) by electroporation. ϳ1200 colonies were picked from each transformation, and the sequencing templates were isolated by a cleared lysate-based protocol from the cultured library (24). Sequencing was performed as previously described (25). After base-calling with the ABI analysis software, the analyzed data were transferred to a Sun workstation cluster for base-calling and assembly using Phred and Phrap programs (26,27), respectively. Overlapping sequences and contigs were analyzed using Consed (28). Gap closure and proofreading was performed using either custom primer-walking or PCR amplification of the region corresponding to the gap in the sequence followed by subcloning into pUC18 and cycle-sequencing with the universal pUC primers. In some instances additional synthetic custom primers were necessary for at least 3-fold coverage for each base.
Northern Blot Analysis-A cDNA fragment corresponding to nucleotides 1252-3356 of the cDNA was excised by XhoI and EcoRV from the plasmid containing the full-length cDNA. The cDNA was labeled with [␥-32 P]dCTP using random hexamer priming with a Klenow fragment of DNA polymerase I (Readiprime II random prime labeling System, Amersham Biosciences) and used to probe a human MTN TM (multi-tissue northern) blot from BD Bioscience following the manufacturer's recommendation, except that all the washing steps were performed at 65°C. The membrane was exposed to a BIOMAX MS x-ray film (Eastman Kodak Co.) with an intensifying screen for 16 h at Ϫ80°C.
Expression of the Precursor in 293T Cells-The human embryonic kidney cell line 293T was grown in Dulbecco's modified Eagle's medium with 10% (v/v) fetal bovine serum at 37°C and 5% CO 2 , 95% air. Cells were transfected with empty vector or vector containing cDNA using FuGene-6 (Roche Applied Science). Cells were harvested by scraping at 72 h post-transfection and incubated in lysis buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 1% Triton X-100, 2 mM MgCl 2 ) on ice for 30 min (14). The lysate was clarified by centrifugation at 40,000 ϫ g for 20 min, and the supernatant was assayed for GlcNAc-phosphotransferase activity as described below and protein concentration by Bradford assay (Pierce).
Alignments of Deduced Protein Sequences-GlcNAc-phosphotransferase sequences derived from databases were assembled and translated. Deduced protein sequences were aligned using ClustalW (30) as implemented by MacVector TM 7.2 (Accelrys, San Diego, CA)

RESULTS
Bovine GlcNAc-phosphotransferase Protein Sequence-Bovine Glc-NAc-phosphotransferase was purified 488,000-fold to apparent homogeneity as previously described (19), subjected to non-reduced SDS-PAGE, and transferred to a polyvinylidene fluoride membrane. The bands corresponding to the ␣-subunit dimer (320 kDa) and ␤-subunit (56 kDa) were excised and subjected to amino-terminal sequencing. Duplicate samples were subjected to in situ Lys-C digestion and fractionated by reverse phase high performance liquid chromatography, and several isolated peptides were subjected to amino-terminal microsequencing. The ␣-subunit bands generated a single amino-terminal sequence of MLLKLLQRQTY and internal peptide sequences of VPM-LVLDXAXPTXVXLK and DLPSLYPSFLSASDVFNVAKPK. The ␤-subunit bands generated an amino-terminal sequence of DTFADSL-RYVNKILNSKFGFTSRKVPAH and internal peptide sequences of ILNSK, TSFHK, FGFTSR, and SLVTNCKPVTDK. Two of the internal sequences, ILNSK and FGFTSR, overlap with the amino-terminal sequence.
Molecular Cloning of a Partial Mouse GlcNAc-phosphotransferase ␤-subunit cDNA-Bovine GlcNAc-phosphotransferase ␤-subunit peptide sequences were used to search the EST data base using the TBLASTN algorithm (31). The search identified a mouse cDNA (Gen-Bank TM accession number L36434) that was previously incorrectly annotated as a basic domain-leucine zipper transcription factor (32). The 1846-bp murine clone contained highly homologous sequences to the determined ␤-subunit amino-terminal and internal peptide amino acid sequences. The amino-terminal sequence of the bovine ␤-subunit began at nucleotide 799 of the mouse cDNA. The 798-bp sequence 5Ј to this revealed an in-frame open reading frame that lacked evidence for either a signal peptide or an initiator methionine, suggesting the ␤-subunit might be derived from a larger precursor. The murine clone was extended by 3Ј-RACE as described under "Experimental Procedures," yielding a ϳ4600-bp partial cDNA. Attempts to extend the 5Ј-sequence by 5Ј-RACE, library screening, and data base searching were unsuccessful.
Molecular Cloning and Characterization of Human GlcNAc-phosphotransferase ␣and ␤-subunits Precursor cDNA-A 1.1-kb fragment from the mouse cDNA was used to screen a human placental cDNA library in gt10 under reduced stringency. Several positive clones were isolated, and the clone with the largest insert (3.4 kb) was sequenced. The remaining portion of the cDNA was cloned by a combination of a walking strategy, 3Ј-RACE, and data base searching. The placental sequence was used to design nested 3Ј-RACE primers that allowed the isolation of a 3Ј-RACE product from human brain Marathon-Ready TM cDNA. The 3426-nucleotide 3Ј-RACE product contained a 1761-nucleotide coding region and a 1665-nucleotide 3Ј-untranslated region. The placental sequence was also used to search the EST data base using the BLASTN algorithm (31). The search identified an I.M.A.G.E. clone 682681 (GenBank TM Accession Number AA204698) which contained a 1271-bp insert composed of a 164-nucleotide 5Ј-untranslated region and 1106-nucleotide coding region that overlapped the 5Ј portion of the placental sequence. Together, these strategies allowed the isolation of a full-length cDNA that spanned 5597 bp. The cDNA sequence for human GlcNAc-phosphotransferase has been deposited to GenBank TM and the accession number is AY687932.
Deduced Structure of GlcNAc-phosphotransferase ␣/␤-subunits Precursor-The nucleotide and deduced amino acid sequences of human GlcNAc-phosphotransferase ␣/␤-subunits precursor are shown in Fig. 1. The 5597-bp cDNA is predicted to encode a protein of 1256 amino acids. The cDNA appears to encode an precursor with the ␣-subunit in the 5Ј-position and the ␤-subunit in the 3Ј-position. Sequences highly homologous to all the amino-terminal and internalpeptide sequences of both the ␣and ␤-subunit of bovine GlcNAcphosphotransferase are represented in the predicted human protein sequence (89 and 100% identity in the ␣and ␤-subunits, respectively). The precursor protein has a predicted molecular mass of 143,582 Da, of which 104,706 and 38,894 Da are for the ␣and ␤-subunits, respectively. The sequence contains 17 consensus sites for N-glycosylation in the ␣-subunit and 3 in the ␤-subunit. It also contains 22 cysteine residues, 18 in the ␣-subunit and 4 in the ␤-subunit.
The sequence surrounding the proposed initiation codon (ggggtgatgc) has a guanosine in position Ϫ3 and a cytidine in position ϩ4, yielding a non-preferred translation initiation sequence (33). There is an in-frame stop codon 129 nucleotides upstream from the initiation codon. The cDNA has a single potential polyadenylation signal (ATA-AAA) 13-18 nucleotides upstream from the poly(A) tail, although the signal was not the more commonly found AATAAA or ATTAAA.
Membrane Topology of GlcNAc-phosphotransferase ␣and ␤-subunits-Examination of a Kyte-Doolittle hydrophilicity plot (34) generated by MacVector TM 7.2 (Accelrys, San Diego, CA) of the predicted GlcNAc-phosphotransferase precursor sequence (Fig. 2) reveals prominent hydrophobic segments of 24 and 26 residues near the amino terminus and the carboxyl terminus, respectively. Because the amino terminus of the deduced human ␣-subunit sequence was similar to the amino terminus of the bovine ␣-subunit sequence, the ␣-subunit contains a signal anchor. The hydrophobic segment in the carboxyl terminus is followed by a cluster of basic residues. This suggests that the GlcNAc-phosphotransferase ␣and ␤-subunits have type I and II topology, respectively.
Expression of GlcNAc-phosphotransferase ␣/␤-subunits Precursor mRNA in Human Tissues-Northern blot analysis with a 2105-bp probe spanning both ␣and ␤-subunit sequences revealed a single transcript in all human tissues examined (Fig. 3). The ϳ6.6-kb mRNA was slightly larger than the 5.6-kb cDNA isolated. This may be due to an unusually long poly(A) tail or an incomplete 5Ј-untranslated region. Longer autoradiography revealed a transcript of the same size in kidney, which is not visible in Fig. 3 (not shown). There was no evidence for individual ␣or ␤-subunits RNA transcripts or alternative splicing.

Expression of Human GlcNAc-phosphotransferase in Mammalian
Cells-A set of expression plasmids for human GlcNAc-phosphotransferase ␣/␤-subunits precursor was constructed in pcDNA6/V5/His-A. The cDNA for the precursor was altered in an attempt to increase the expression level by replacing the native sequence surrounding the initiation codon (ggggtgatgc) with a preferred Kozak consensus sequence (33). The preferential consensus sequence (gccaccatgg) that contains guanosine at position ϩ4 required changing the second amino acid from leucine (ctg) to glycine (ggg), alanine (gcg), or valine (gtg). The control plasmid with leucine had the sequence gccaccatgc, where the initiation codon is underlined.
When 293T cells were transfected with the precursor expression plasmids, GlcNAc-phosphotransferase activity was increased at 72 h, 12-17-fold compared with endogenous level in mock-transfected cell lysate (Fig. 4A). Interestingly, the highest activity was obtained when the second amino acid remained leucine. This suggested that leucine is preferred over other amino acids that better satisfy the Kozak consensus features at position ϩ4.   Co-transfection of the precursor expression plasmid (Leu 2 ) with the ␥-subunit expression plasmid resulted in a 3.6-fold increase in GlcNAcphosphotransferase activity when compared with the mock transfection (Fig. 4B). For comparison, when the precursor expression plasmid was co-transfected with empty vector, a 5.7-fold increase was observed. Similar results were obtained with the Gly 2 expression plasmid, but the overall expression level was lower. The reduced activity on ␣-methylmannoside by co-transfection of the regulatory ␥-subunit is consistent with the role of this subunit in lowering the K m for lysosomal enzymes.
Isolation and Sequencing of a Gene Encoding the Human GlcNAcphosphotransferase ␣/␤-subunits Precursor-Human BAC 14951 12 was isolated using a 107-bp probe that corresponded to nucleotides 181-287 in the cDNA as described under "Experimental Procedures" and contained a 177.4-kb insert encoding the full-length precursor gene as diagramed in Fig. 5A. The sequence for BAC14951 was deposited in GenBank TM and given accession number AC005409. This BAC maps to chromosome 12q23 by Mega Blast (35). The gene spans 85.3 kb and is distributed in 21 exons. The detailed structure of the gene including the sequence of each intron-exon junction, size, and location in the chromosome framework are provided in Supplemental Fig. S1.
Inspection of the gene sequence showed that Intron 1 is unusually large (33.8 kb); however, no known gene or pseudogene was found in that region. The nucleotide sequences at the 5Ј-donor and 3Ј-acceptor sites of all the introns conform to the GT . . . AG rule (36). There are only three nucleotides in the coding region of the human GlcNAcphosphotransferase precursor cDNA that did not match the genomic sequence. One apparent polymorphism was identified (g2096a) that does not change the protein sequence. Two other nucleotide substitutions (g1338a and t2866a) result in amino acid substitutions, V329A and L901Q, respectively.
GlcNAc-phosphotransferases in Other Organisms-Comparison of the GlcNAc phosphotransferase ␣/␤-subunits precursor cDNA sequence with available databases allowed the identification of apparent GlcNAc-phosphotransferases in a variety of organisms. Putative fulllength sequences were identified in Mus musculus, Dictyostelium discoideum, and Drosophila melanogaster. The deduced protein sequences for these sequences are aligned with the human sequence in supplemental Fig. S2. Additional incomplete sequences were identified in Rattus norvegicus, Bos taurus, Gallus gallus, and Danio rerio; the deduced partial protein sequences from these organisms are aligned in supplemental Fig. S3. Two of the three unique bovine ␤-subunit protein sequences determined were found in the deduced partial bovine sequence. The third ␤-subunit sequence and the ␣-subunit sequences were from regions not covered by the partial sequence. Inspection of the aligned vertebrate sequences in supplemental Fig. S3 demonstrates that the proteins are highly homologous. Between human and mouse ␣-subunit sequences, amino acids 1-195 define a region with a remarkable 94% sequence identity. A second region from amino acids 207-740 demonstrates more than 88% sequence identity. A region of less homology then extends from amino acid 741 to the carboxyl terminus of the ␣-subunit at amino acid 928, except from a short region from amino acid 876 to 928 with a Ͼ95% sequence identity. The carboxyl-terminal portion of the ␣-subunit sequence demonstrates the least similarity, and this region is absent from the D. discoideum and D. melanogaster sequences (Supplemental Figs. S2 and S3). The ␤-subunit sequences are also highly homologous, ranging from 97% for the mouse-human comparison to 92% identity for the chicken-human comparison.
Sequence Motifs in GlcNAc-Phosphotransferase-Comparison of the deduced GlcNAc-phosphotransferase sequences with the sequence databases identified two sequence motifs in the ␣-subunit (Fig. 6). A 113-amino acid sequence extending from amino acids 321 to 432 has similarity to a group of bacterial capsular polymerases. The human sequence demonstrates 46% sequence identity and 21% similarity with Neisseria meningitidis XcbA over this region (supplemental Fig. S4A). The aligned sequences are compared in Fig. 6. The ␣-subunit also contains two Notch-like repeats (pfam 00066.11) that span amino acids 433-469 and 500 -536. Although the overall similarity to the Notch repeat consensus sequence is only 47 and 46% for repeats one and two, respectively, the positions of the six cysteines satisfy the Notch consensus for spacing. Twelve of the 18 cysteines in the ␣-subunit are found in the Notch-like repeats (Fig. S4B).

DISCUSSION
We isolated a single cDNA and gene encoding both the ␣ and ␤ subunits of human GlcNAc-phosphotransferase. Using purified bovine GlcNAc-phosphotransferase, we identified three different subunits and proposed that they were in an ␣ 2 ␤ 2 ␥ 2 configuration (6). The model for the 540-kDa complex was based on the presence of disulfide-linked ␣ 2 and ␥ 2 dimers. The stoichiometry of the ␤-subunits was determined by quantitative protein sequencing that demonstrated a 1:1 molar ratio of ␤-subunit to ␥-subunit. The identification of a single cDNA encoding the ␣ and ␤ subunits suggests that these two subunits are derived by proteolytic processing of a polyprotein precursor. This finding further supports the model containing two ␤-subunits per ␣ 2 dimers. The amino acid sequence surrounding the proposed ␣/␤ cleavage site did not match with consensus sequences for known proteases (MacVector TM 7.2, Accelrys, San Diego, CA). The protease and the exact site of the cleavage remain undetermined.
Our previous finding that the ␥-subunit is mutated in patients with mucolipidosis IIIC (mucolipidosis III, variant form) (7), a disease where GlcNAc-phosphotransferase activity is present but unregulated, suggested the catalytic domain was localized in the ␣or ␤-subunits. Transfection of mammalian cells with an expression plasmid encoding only the precursor increased GlcNAc-phosphotransferase activity in cell lysate 12-17-fold. In this experiment GlcNAc-phosphotransferase activity was determined with an assay measuring transfer of [␤-32 P]-GlcNAc 1-phosphate from [␤-32 P]UDP-GlcNAc to ␣-methylmannoside, an assay that does not require active ␥-subunit. Taken together, the result demonstrates the localization of the catalytic domain in the ␣and/or ␤-subunits. Co-transfection of the ␥-subunit cDNA also results in GlcNAc-phosphotransferase activity, but this is reduced, consistent with the known regulatory function of the ␥-subunit (7).
Further evidence that we have cloned the catalytic domain of the enzyme is suggested by our finding of a motif common to bacterial capsular polymerases. This 113-amino acid motif was found between amino acids 321 and 432 of the ␣-subunit ( Fig. 6 and supplemental Fig.  S4A). This subset of bacterial capsular polymerases is a group of enzymes that generates capsular polysaccharide composed of repeating HexNAc 1-phosphate units, such as GlcNAc 1-phosphate or ManNAc 1-phosphate (37)(38)(39). The finding of a similar region in GlcNAc-phosphotransferase that also transfers GlcNAc 1-phosphate suggests that this region may be functionally related and perhaps the result of a gene transfer event. This region might be involved in the binding of nucleotide sugar or transfer of sugar phosphate. Interestingly, no homology was observed between GlcNAc-phosphotransferase and UDP-Nacetylglucosamine dolichyl-phosphate N-acetylglucosamine phosphotransferase 1 (EC 2.7.8.15; DPAGT1, GlcNAc-1-P transferase), which also transfers GlcNAc 1-phosphate, suggesting that these enzymes do not have a common origin.
GlcNAc-phosphotransferase is a membrane-bound enzyme. Both the ␣and ␤-subunits contain predicted transmembrane domains.
Because the ␥-subunit sequence does not predict a transmembrane domain, the retention of GlcNAc-phosphotransferase in the endoplasmic reticulum and cis-Golgi may be controlled by the transmembrane domains in the ␣and ␤-subunits. The process of assembling the ␥-subunit with the ␣and ␤-subunits is still unclear. Although the ␥-subunit showed limited similarity to endoplasmic reticulum glucosidase II ␤-subunit, no homology was found between the ␣or ␤-subunits of GlcNAc-phosphotransferase and endoplasmic reticulum glucosidase II ␣-subunit.
Another sequence homology identified was two Notch-like repeats between amino acids 433-469 and 500 -536 ( Fig. 6 and supplemental Fig. S4B). Unlike Notch proteins, where there are three tandem repeats, only two repeats were found in the ␣-subunit of GlcNAc-phosphotransferase. These two Notch-like repeats were separated by ϳ30 amino acids, whereas no such separation is found in Notch proteins. In Notch proteins each repeat contains six cysteines that form three disulfide bonds between the first and fifth, the second and fourth, and the third and sixth cysteines (pfam0066.11, Notch). Notch repeats are reported to be involved in negative regulation of Notch proteins (40,41). The role of the Notch-like repeats in GlcNAc-phosphotransferase, if any, is unknown.
Of the 18 cysteines in the ␣-subunit sequence, 12 are likely to be involved in Notch-related disulfide bonds. Two of the other 6 cysteines are in the amino-terminal membrane anchor domain, and 1 at position 562 is not conserved in the mouse homologue. Therefore, cysteines at position 70, 128, and/or 133 appear most likely to be involved in the observed intermolecular disulfide linkage(s) between the two ␣-subunits. The ␤-subunit contains 4 cysteines, all of which are presumed to be in intramolecular disulfide linkages, since the ␤-subunits are not disulfide-linked to each other or another subunit.
Sequences homologous to human GlcNAc-phosphotransferase ␣/␤subunits precursor protein were found both in vertebrates (supplemental Fig. S3) and invertebrates (supplemental Fig. S2). In vertebrates, an apparent full-length sequence was identified in mouse, and partial sequences were identified in rat, cow, chicken, and zebrafish. These vertebrate sequences are highly homologous to each other and to the human sequence. In invertebrates, two putative full-length sequences were found in slime mold (D. discoideum) and fruit fly (D. melanogaster). Interestingly, these invertebrate sequences have shorter ␣-subunits than human or mouse GlcNAc-phosphotransferase, lacking the carboxyl-terminal portion ( Fig. 6 and supplemental Fig. S2). However, the sequences similar to bacterial capsular polymerase and Notch-like repeats are present (Fig. 6). The vertebrate and invertebrate ␤-subunits sequences are more similar (supplemental Fig. S2). It remains to be demonstrated that these similar sequence represent invertebrate GlcNAc-phosphotransferase.
The gene for GlcNAc-phosphotransferase precursor was localized to human chromosome 12q. This is in contrast to an abstract suggesting the gene for ML II (I-cell disease) resides at 4q21-4q23 (42). We found no evidence for an assignment to chromosome 4 and believe conclusion of the previous report is in error. Recently, we reported a mutation in the precursor gene in an MLIIIA (pseudo-Hurler polydystrophy) patient who had a very low level of GlcNAc-phosphotransferase activity (43). Mutations in the precursor gene have been identified in additional ML II and ML IIIA patients. 3 In summary, we cloned the cDNA and gene encoding the precursor of GlcNAc-phosphotransferase ␣/␤-subunits. This completes the cloning of both the cDNA and genomic DNA of all the three subunits of  GlcNAc-phosphotransferase. This allows the expression of recombinant GlcNAc-phosphotransferase and in vitro phosphorylation of lysosomal hydrolase as we report later. 4 Additionally, the genomic structure allows us to dissect the relationships between the complex and related disorders of lysoso mal targeting, mucolipidosis II (I-cell