O-Linked GlcNAc transferase is a conserved nucleocytoplasmic protein containing tetratricopeptide repeats.

O-Linked GlcNAc addition and phosphorylation may compete for sites on nuclear pore proteins and transcription factors. We sequenced O-linked GlcNAc transferase from rabbit blood and identified the homologous Caenorhabditis elegans transferase gene on chromosome III. We then isolated C. elegans and human cDNAs encoding the transferase. The enzymes from the two species appear to be highly conserved; both contain multiple tetratricopeptide repeats and nuclear localization sequences. The C. elegans transferase accumulated in the nucleus and in perinuclear aggregates in overexpressing transgenic lines. O-Linked GlcNAc transferase activity was also elevated in HeLa cells transfected with the human cDNA. At least four human transcripts were observed in the tissues examined ranging in size from 4.4 to 9.3 kilobase pairs. The two largest transcripts (7.9 and 9.3 kilobase pairs) were enriched at least 12-fold in the pancreas. Based on its substrate specificity and molecular features, we propose that O-linked GlcNAc transferase is part of a glucose-responsive pathway previously implicated in the pathogenesis of diabetes mellitus.

Over the last 10 years, a novel post-translational modification involving the addition of a single N-acetylglucosamine in O-glycosidic linkage to serine or threonine residues on cytoplasmic and nuclear proteins has been identified (1,2). We became interested in this form of glycosylation when it was found to modify a group of nuclear pore proteins, which have since been molecularly characterized (2)(3)(4). In addition to nuclear pore proteins, O-linked GlcNAc modifies a large number of polypeptides in multimeric structures including RNA pol 1 II transcription complexes and p67/eIF-2-initiation factor in the translation machinery.
Although the addition of O-linked GlcNAc to proteins is formally a glycosyltransferase reaction, it is quite distinct from other forms of glycosylation. The addition occurs in the cytosol and nucleus, unlike other glycosylation reactions, which are restricted to the endomembrane system of the cell (2,3). Because it must function in the cytoplasm where UDP-GlcNAc levels are lower, it has a much lower K m with respect to this substrate than is usually observed for glycosyltransferases. In many ways O-linked GlcNAc addition is analogous to protein phosphorylation. The enzyme which catalyzes O-linked GlcNAc addition, uridine diphospho-N-acetylglucosamine:polypeptide ␤-N-acetylglucosaminyl transferase (O-GlcNAc transferase, OGT), has been shown to recognize a large number of phosphoproteins, some of which play a direct role in signal transduction. In the case of RNA-polymerase II, these modifications seem to be mutually exclusive so that while the glycosylated enzyme is necessary for assembly of the preinitiation complex, subsequent deglycosylation and phosphorylation are necessary for transition to the elongation complex (5). In the case of other substrates, like neurofilaments (6) or the nuclear pore proteins Nup62, -97, and -200 (7), it appears that phosphorylation and glycosylation can coexist on the same molecule. The role of protein phosphorylation as a regulatory mechanism for signal transduction in eukaryotic cells was originally identified in studies over 40 years ago on glycogen phosphorylase, an enzyme involved in carbohydrate metabolism (8). It is likely that O-linked GlcNAc addition to proteins in the cytoplasm and nucleus is also highly regulated. Since both phosphorylation and glycosylation compete for similar serine or threonine residues, it is possible that the two processes could be directly competing for sites, or they may alter the substrate specificity of nearby sites by steric or electrostatic effects.
No strict consensus sequence for O-linked GlcNAc addition has so far been identified, although most glycosylation sites occur nearby proline or valine residues and typically in stretches rich in serine or threonine residues. A subset of glycosylation sites is located near acidic amino acid residues (9,10). These glycosylation sites are similar to phosphorylation sites for several protein kinases (11). We have previously shown that OGT has a much higher affinity for the recombinant nucleopore protein, Nup62, than any synthetic peptide, suggesting that the enzyme may recognize other parts of the protein and not just a specific consensus sequence (10).
Although OGT has been purified from several different sources (10,12), it has not been molecularly cloned. Here we describe the purification of rabbit OGT and use of this sequence data to clone the Caenorhabditis elegans and human enzymes. The cloned enzymes are highly conserved; both contained multiple tandem tetratricopeptide repeats (TPR) and putative nuclear localization sequences. HeLa cells transiently transfected with the human enzyme had elevated OGT activity. Polyclonal antiserum prepared against the recombinant OGT was used to localize the enzyme to the nucleus and perinuclear aggregates in transgenic C. elegans embryos. Human OGT transcripts were observed in all tissues examined, but the highest levels of expression were observed in the pancreas.

EXPERIMENTAL PROCEDURES
Purification of Rabbit O-GlcNAc Transferase-Fresh rabbit blood (4 liters), anticoagulated with EDTA (Pel-Freez), was pelleted in a GS3 rotor at 2,000 ϫ g for 5 min. The red blood cells were washed three times * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) U77412 and U77413.
with an isotonic salt solution (140 mM NaCl, 5 mM KCl, 1.5 mM magnesium acetate 2 ) and collected after centrifugation at 2,000 ϫ g for 5 min for the first two washes and 5,000 ϫ g for 10 min after the final wash. Hypotonic lysis was performed using an equal volume of ice cold water containing the following protease inhibitors (Boehringer Mannheim), 1 mM phenylmethylsulfonyl fluoride, 10 g/ml chymostatin, 10 g/ml pepstatin, 10 g/ml leupeptin, 0.1% aprotinin, and 2 mM EDTA. The lysate was pelleted at 10,000 ϫ g for 40 min in a GSA rotor. The soluble fraction was made 30% saturated with ammonium sulfate by adding a stock of 100% saturated ammonium sulfate equilibrated at 4°C slowly over 1 h and stirring the solution an additional 2 h at 4°C. The precipitate was collected after centrifugation at 10,000 ϫ g for 40 min in a GSA rotor and resuspended in 15-20 ml of 50 mM Tris-HCl, pH 7.4, 2 mM MgCl 2 using a Dounce homogenizer. The insoluble material was removed by centrifugation at 20,000 ϫ g for 20 min in a SS34 rotor. The soluble fraction from the 30% ammonium sulfate precipitation was loaded onto a 15-ml phenyl-Sepharose column (Pharmacia Biotech Inc.), washed with 100 ml of 10 mM Tris-HCl, pH 7.5, 100 mM ammonium sulfate, and eluted with 40 ml of 10 mM Tris-HCl, pH 7.5, 60% ethylene glycol. All chromatography buffers also contained the following protease inhibitors: 0.1% aprotinin, 10 g/ml leupeptin, 10 g/ml pepstatin, 0.1 mM phenylmethylsulfonyl fluoride, and all procedures were performed at 4°C. The active fractions (15-20 ml) were pooled, passed through a 0.45-m Millex-HA filter, and loaded onto a Mono Q HR 10/10 anion exchange column equilibrated with 50 mM Tris-HCl, pH 7.5, 12.5 mM MgCl 2 , 20% glycerol, 2 mM EDTA using a Pharmacia FPLC system. The column was washed with 30 ml of the equilibration buffer and then eluted with a linear gradient from 0 to 300 mM NaCl in 50 ml of equilibration buffer at a flow rate of 1 ml/min. The active fractions (8 -10 ml) were pooled and concentrated to a final volume of 0.3 ml using a Centricon-30 microconcentrator (Amicon) and loaded in 0.15-ml aliquots onto a Superose 6 FPLC column equilibrated with 50 mM Tris-HCl, pH 7.5, 12.5 mM MgCl 2 , 20% glycerol, 2 mM EDTA, 100 mM NaCl. The column was run at a flow rate of 0.15 ml/min, and 0.6-ml fractions were collected. Protein was calculated using the BCA reagent (Pierce) using bovine serum albumin as a standard. O-GlcNAc transferase activity was measured using recombinant Nup62 bound to nitrocellulose membranes as described previously (10) or in a modification of the method using recombinant Nup62 bound to ScintiStrip polystyrene scintillation strips (Wallac). A typical purification results in a 30,000fold purification and a 1-2% yield. Purified O-GlcNAc transferase was subject to sodium dodecyl sulfate-polyacrylamide gel electrophoresis, and the 110-kDa band was cut out and sent to the William M. Keck Foundation at Yale University for in gel trypsin digestion, high pressure liquid chromatography purification, and amino acid sequencing.
Cloning of the C. elegans O-GlcNAc Transferase-Polymerase chain reaction (PCR) primers GTTTGTTACTTGAAAGCAATCG and ATC-GAAAATCCTGGCCTCTT were made to amplify a 195-base pair fragment from the cDNA clone yk13c2 ( Fig. 2A). This PCR fragment was gel-purified and used to probe a ZAP C. elegans cDNA library (10 10 units/ml) (13). 140,000 clones were screened in this manner, and only 1 positive plaque was identified. The identified insert (3.1 kb) was subcloned into pGem and pET 32 using EcoRI. This insert was sequenced and localized to the C-terminal 70% of the open reading frame of CelK04G7.3. Using the known sequence for the open reading frame for the CelK04G7.3 gene, primers were constructed to make the 5Ј end using high fidelity Takara Ex Taq DNA polymerase. The PCR fragment was cloned into the HindIII site in the original clone isolated from the ZAP library yielding cDNA clone ZAP-CeOGT (GenBank™ accession number U77412).
In Vitro Translation and Expression in E. coli-In vitro translation was performed using the TNT T7-coupled wheat germ extract system (Promega) using the manufacturer's instructions. The full-length C. elegans cDNA (ZAP-CeOGT) was cloned into pET32a and transfected into BL21(DE3) cells for expression. Cells were grown in Luria-Bertani medium containing 50 g/ml carbenicillin at 37°C and 220 rpm until A 600 ϭ 0.6. Cells were induced with 1 mM isopropyl-1-thio-␤-D-galactopyranoside for 90 min at 37°C and harvested by centrifugation at 3000 rpm for 5 min at 4°C in a Beckman GS-6R centrifuge. After resuspension in 0.1 volume of 50 mM Tris-HCl, pH 8, 2 mM EDTA, 100 g/ml lysozyme, 0.1% Triton X-100, cells were incubated at 30°C for 15 min, placed in an ice bath, and sonicated two times for 10 s to shear the DNA. The O-GlcNAc transferase was pelleted at 12,000 ϫ g for 10 min at 4°C, and solubilized with His-Tag (Novagen) binding buffer in 6 M urea (5 mM imidazole, 50 mM NaCl, 20 mM Tris-HCl, pH 7.9). After centrifugation at 12,000 ϫ g for 10 min at 4°C, the solubilized protein was loaded onto a 2.5-ml His-Tag column, washed with 8 ml of binding buffer, and eluted with 8 ml of elution buffer in 6 M urea (60 mM imidazole, 50 mM NaCl, 20 mM Tris-HCl, pH 7.9). Column fractions were analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Full-length OGT was gel-purified and used to generate polyclonal antibodies in guinea pigs.
Immunofluorescence Localization of O-GlcNAc Transferase in C. elegans Embryos-C. elegans embryos were fixed with formaldehyde applied to glass slides (14) and visualized by indirect immunofluorescence using a fluorescein isothiocyanate-labeled goat anti-guinea pig antibody raised against recombinant C. elegans OGT. The immunofluorescence was detected using a Bio-Rad 1024 confocal microscope equipped with a 60ϫ objective.
Transgenic C. elegans Strains-Transgenic strains were generated by microinjection using the pRF4 plasmid as a marker to identify transformed animals (15). Test plasmid constructs were injected in combination with pRF4 DNA at 50 ng/l each.
Overexpression of the OGT gene was achieved by transformation of N2 animals with derivatives of the heat shock promoter vectors (pPD49.78 and pPD49.83) (15) in which the full-length C. elegans OGT cDNA (NcoI-SacI partial digest, 4.25 kb) was cloned into the NcoI and SacI restriction sites of the vector. Transgenic animals were heatshocked at 33°C for 2-4 h to induce production of fusion proteins driven by heat shock promoters. Overexpressed OGT was detected by immunoblotting using a anti-OGT guinea pig antibody raised against the recombinant protein made in E. coli.
Cloning of the Human O-GlcNAc Transferase-Taking advantage of the published sequence of a human expressed sequence tag, accession number R75943 (Fig. 2B), two oligonucleotide primers (GCGTTTTC-CAGCAGTAGGAG and ACATTCTGAAGCGTGTTCCC) were constructed and used to screen superscript human brain and liver cDNA libraries using the Genetrapper cDNA-positive selection system. The first primer was biotinylated at the 3Ј end with biotin-14-dCTP using terminal deoxynucleotidyl transferase and used to screen singlestranded human liver or brain cDNA libraries. Hybrids between the biotinylated oligonucleotide and the cDNA libraries were captured on strepavidin-coated paramagnetic beads and retrieved using a magnet. The captured ssDNA was separated from the biotinylated primer, repaired to double-stranded DNA using the second oligonucleotide primer, and transformed into ElectroMAX DH10B cells. There were a total of 48 liver and 53 brain clones identified on the initial screen. These clones were then rescreened by hybridization with the full-length human placenta expressed sequence tag, accession number R75943, and 40/48 liver and 42/53 brain clones were found to be positive. The insert size was estimated by restriction digestion with SalI and NotI. All liver clones Ͼ 2.5 kb and brain clones Ͼ 3 kb were screened by in vitro translation. The largest in vitro translation product identified was a protein of about 100 kDa formed by 6 different liver and 2 brain clones (data not shown). DNA sequencing showed that they were all overlapping clones of the same gene with variable 5Ј and 3Ј untranslated regions. The full-length clone Lv4F was fully sequenced and is reported here (GenBank™ accession number U77413).
Filter Hybridization-Human, rabbit, rat, and mouse genomic DNA (Clontech) was digested overnight with EcoRI, chromatographed (3 g/lane) on a 0.7% agarose gel, and transferred to nylon membranes (GeneScreen Plus, DuPont) by capillary action. The blot was prehybridized in 1% bovine serum albumin, 0.5 M NaPO 4 , pH 7, 1 mM EDTA, 7% sodium dodecyl sulfate, 100 g/ml denatured salmon testis DNA at 55°C for 1 h and then hybridized overnight at 55°C with the gelpurified, radiolabeled 3-kb NotI-SalI fragment from human liver clone Lv4F. The blot was washed two times for 15 min with 0.5% bovine serum albumin, 5% sodium dodecyl sulfate, 40 mM NaPO 4 , pH 7, 1 mM EDTA at 55°C; two times for 15 min with 1% bovine serum albumin, 40 mM NaPO 4 , pH 7, 1 mM EDTA at 55°C; and once with 0.2 ϫ SSPE (30 mM NaCl, 2 mM NaPO 4 , pH 7.4, 0.2 mM EDTA) at 55°C for 15 min. It was exposed to Kodak Bio-Max MR film for 1-7 days at Ϫ70°C. A human multiple tissue Northern blot (Clontech) was prehybridized as described previously for the Southern blot except that all incubations were performed at 65°C.
Expression of Human O-GlcNAc Transferase in Transiently Transfected HeLa Cells-Human OGT clone Lv4F was introduced to HeLa cells by lipid-mediated transfection or by electroporation. For the lipidmediated method, 10 5 cells were plated per well in six-well plates in Dulbecco's modified Eagle's medium, 10% fetal bovine serum for 14 -18 h prior to transfection. The transfection was carried out in Opti-MEM (Life Technologies, Inc.). The plasmid pECE-OGT/Lv4F (0.1 g) was mixed with 4 l of Lipofectin reagent (Life Technologies, Inc.) and applied to the cells according to the manufacturer's recommendations. Control cells were transfected with plasmid bearing no insert. Electroporation of HeLa cells was performed in Opti-MEM in cell suspensions (5 ϫ 10 6 /ml) containing 0.5 g/ml pECE-OGT/Lv4F or pECE. Cells were shocked at 4°C, with capacitance set at 1180 microfarads, and voltage at 200 V using a Life Technologies, Inc. electroporator. Following 1-2 min on ice after the shock, cells were diluted and plated in Dulbecco's modified Eagle's medium, 10% fetal bovine serum. Transfection efficiencies for both transfection methods were estimated using a plasmid which encodes green fluorescent protein (pGreenLantern; Life Technologies, Inc.) and were typically 10 -20%. Cells were harvested 24 h after transfection, by either method, lysed by sonication, and centrifuged. The supernatant fraction was assayed for OGT enzyme activity using ScintiStrip wells (Wallac) precoated with Nup62. Assays were performed in 50 mM Tris-HCl, pH 7.4, 12.5 mM MgCl 2 and 1 Ci of UDP-GlcNAc-[ 3 H]GlcNAc in a final volume of 40 l for 90 min at 37°C and 220 rpm.

RESULTS
Isolation and Purification of Mammalian OGT-A mammalian OGT was purified from rabbit blood using a modification of previously described methods (10,12). The purified enzyme shown in Fig. 1 contains two polypeptides at 110 and 78 kDa. Recovery of the 78-kDa band was variable between preparations, thus preventing isolation of sufficient amounts for further analysis. Proteolytic fingerprinting of both polypeptides suggested they are related. The 78-kDa band may be a proteolytic product of the larger 110-kDa band or the product of a second translation start site. The 110-kDa band was subjected to tryptic digestion and microsequencing; two peptides were initially identified, a 20-mer peptide, XVSLDPNFLDAYINL-GNVLK, and a 17-mer, XXXSQLT(C)LG(C)LELIAK. The 20mer was a perfect match to a sequence contained within the expressed sequence tag, cDNA clone yk13c2 (gb-CelK013C2F) and in a previously uncharacterized gene, K04G7.3 (accession number U21320) identified as part of the C. elegans genome sequencing project (Fig. 2A). Both peptide sequences were preceded by basic amino acids consistent with the generation of these fragments by trypsin digestion. Fig. 2A shows the structure of the gene and localizes the tryptic peptides to the 8th and 14th exons in the C. elegans gene. Two human expressed sequence tags (accession numbers R75943 and R76782), showing greater than 60% identity to the C. elegans gene K04G7.3 were also identified and found to match perfectly the 17-mer rabbit OGT tryptic peptide (Fig. 2B).
Cloning of the cDNA Encoding C. elegans OGT-The OGT cDNA was isolated using a combination of phage library screening and polymerase chain reaction (see "Experimental Procedures"). The final clone (ZAP-CeOGT accession number U77412) was sequenced and found to be nearly identical to the published CelK04G7.3 gene sequence (Fig. 3), except that it was lacking the third and fourth exons predicted by the program Genefinder (16). The exclusion of these two exons (see Fig. 2A), corresponding to base pairs 204 -333 in the previously published sequence, does not affect the reading frame of the remaining sequence. Sequence analysis of the gene shows that it has 13 tandem tetratricopetide repeats contained in exons 6 -10, followed by a putative nuclear localization sequence near the end of exon 10 (Fig. 4).
Cloning of the Human OGT-The human OGT was isolated The cysteine residues are tentative assignments because they could not be distinguished from glutamine residues, which would comigrate in the amino acid profile. B, the 3.1-kb full-length sequence of human liver OGT clone Lv4F (accession number U77413) is shown. The partial amino acid sequences of the two tryptic peptides isolated from the rabbit OGT, XVTLDPNFLDAY-INLGNVLK and XXXSQLT(C)LG(C)LELIAK, are compared with clone Lv4F and to two human expressed sequence tags (accession numbers R75943 and R76782). using primers constructed from the sequence of the human expressed sequence tag (accession number R75943). These primers were used to screen superscript human brain and liver cDNA libraries using the Genetrapper cDNA-positive selection system (see "Experimental Procedures"). From 101 initial clones, a total of 8 full-length clones were identified. Six of these clones were obtained from liver and 2 from brain libraries; all had overlapping 5Ј and 3Ј untranslated sequences. When translated in vitro, each of these full-length clones produced a polypeptide of approximately 100 kDa and variable amounts of a smaller 70-kDa species. The 70-kDa species, resulting from an alternative translation start, may be related to the 78-kDa species that has been observed with purified OGT preparations (Fig. 1). One of the liver cDNA clones, Lv4F (accession number U77413), was fully sequenced (Fig. 5) and found to encode an open reading frame with 68% identity with the C. elegans K40G7.3 gene product over the C-terminal 872 amino acids (Fig. 4A). The human cDNA open reading frame encodes a shorter protein (103 kDa) containing only the last nine TPR sequences found in C. elegans (Fig. 4B). While this is consistent with the observed size of the in vitro translation product, it is likely that post-translational modification of the enzyme occurs since the human OGT translated in reticulocyte lysate was slightly larger than the product seen from wheat germ extract (data not shown). This behavior has been previously observed for proteins modified by O-linked GlcNAc (4).
To examine the properties of the protein, we expressed OGT in E. coli as described under "Experimental Procedures." Attempts to purify fully functional native enzyme from the bacteria were unsuccessful since the enzyme aggregated into inclusion bodies and became insoluble. The enzyme could be solubilized in 6 M urea and purified by His-Tag chromatogra-phy. This recombinant form of OGT was used to raise polyclonal antisera in guinea pigs for immunodetection (see below).
Transgenic C. elegans Lines Overexpressing OGT-To examine the localization of the OGT in C. elegans, several transgenic lines were produced that overexpress enzyme under control of heat shock promoters. When induced to overexpress by heat shock, the C. elegans OGT was readily detected by immunoblotting using the guinea pig antiserum (Fig. 6A). Indirect immunofluorescence of OGT using this antiserum in wild-type C. elegans embryos showed a punctate perinuclear and nuclear LG(C)LELIAK are underlined. The cysteine residues are tentative assignments because they could not be distinguished from glutamine residues, which would comigrate in the amino acid profile. The putative nuclear localization signal (NLS) is underlined. Sequence data are derived from the cDNAs (accession numbers U77412 and U77413) reported here. B, schematic diagram showing the relative sizes of the C. elegans and human OGT, as well as the location of TPR repeats and putative nuclear localization signal (NLS). pattern (Fig. 6B, top panel). In embryos overexpressing the enzyme (Fig. 6B, lower panels), OGT was found within the nucleus in the gut suggesting that the nuclear localization sequence in C. elegans OGT is functional. In other regions of the embryos, the overexpressed OGT exhibited a distinct perinuclear localization (small arrows). This was particularly striking in the neurons. Similar localization was observed in all of the lines produced, although the tissue distribution was somewhat dependent upon the heat shock promoter used as has been previously reported (15). Thus, the enzyme was found in both the nucleus and the cytoplasm, depending on the tissue overexpressing the cDNA.
Functional Expression of OGT in HeLa Cells-The fulllength human cDNA (clone Lv4F) was cloned into the pECE vector downstream of the SV40 promoter. HeLa cell cultures were transiently transfected with vector alone or the vector containing the clone Lv4F open reading frame. Cells were harvested at 24 h because the transfected cells did not survive well during prolonged incubations suggesting the gene may be toxic to the cells. Toxicity had also been observed in experiments where the gene was overexpressed in transgenic C. elegans. 2 Up to a 3-fold increase in enzyme activity relative to background activity was observed using two different transfection procedures (Fig. 7).
Conservation and Tissue Distribution of OGT-Hybridization of the human liver OGT cDNA (clone Lv4F) to genomic DNA digested with EcoRI from several different species is shown in Fig. 8. The Southern analysis identifies a single large fragment in human, whereas several smaller fragments are observed in rabbit, rat, and mouse genomic DNA. The high degree of conservation observed is not surprising since C. elegans and human OGT cDNAs were found to be so similar. Several additional sequences in the data base searches were found to be related to the C. elegans and human sequences, including sequences from schistosomes and rice. To examine the relative abundance of the human OGT mRNA in various adult human tissues, a Northern blot analysis was performed (Fig. 9). The human clone Lv4F probe identifies four distinct bands at 9.3, 7.9, 6.3, and 4.4 kb, which are present in different amounts in various human tissues. Skeletal muscle and heart exhibited a relative enrichment of the 6.3-kb species, while all transcripts were at low relative abundance in the kidney and lung. The pancreas, where the two largest species (9.3 and 7.9 kb) were most abundant, showed the highest level of expression. A Northern blot analysis of the same blot with ␤-actin cDNA confirmed that similar levels of mRNA were loaded in each lane.

OGT Is Encoded by a Previously Unidentified, Conserved
Gene in C. elegans-Here we describe the first molecular characterization of a mammalian OGT. The OGT was purified using recombinant rat nuclear pore protein, Nup62, as substrate. The enzyme, isolated from rabbit blood (110 kDa), was subjected to trypsin fragmentation, high pressure liquid chromatography purification, and microsequencing. The partially sequenced enzyme was found to be nearly identical to an open reading frame in the C. elegans gene, K04G7.3, on chromosome III ( Fig. 2A). Using this sequence information, we isolated a full-length cDNA clone corresponding to the nematode gene (Fig. 3). A human EST was used to make primers to the C-terminal part of the gene to isolate the human cDNA (Fig. 2B, see also Fig. 5). A search of GenBank™ using these genes identified homologous EST sequences from Schistosoma mansoni and rice. O-Linked GlcNAc has been previously reported in schistosome glycoproteins (17) and in plants (18).
OGT Is a Highly Conserved Member of the TPR Family of Proteins-The human and C. elegans OGT cDNAs isolated were very similar, showing 66% identity at the nucleotide level and 68% identity at the amino acid sequence level (Fig. 4A). Both polypeptides have TPR motifs in the N-terminal part of the protein. The C. elegans open reading frame encodes a larger protein with 13 tandem TPR motifs compared with 9 for the human gene (Fig. 4B). TPR is a 34-amino acid repeated motif having the following 8 loosely conserved residues: -W-LG-Y-A-F-A-P- (19). Over the TPR region the identity between human and C. elegans OGT is even more striking with 87% identity on the amino acid level. TPR motifs are typically arranged as tandem arrays as seen here for OGT. The motif has been identified in a wide variety of proteins in multimeric complexes involved in neurogenesis, cell cycle control, transcription, and peroxisomal transport (20). Several proteins involved in these processes are also known substrates for OGT. These include neurofilaments, tau, synapsins, RNA polymerase II, Sp1, c-Fos, c-Jun, c-Myc, v-Erb A, and the estrogen receptor (9). It is believed that TPR motifs can interact with each other mediating intra-and intermolecular protein-protein interactions. The motifs have also been suggested to play a role in targeting proteins to their proper intracellular localization (21). It is therefore likely that the TPR domain of OGT plays a role in targeting the enzyme to its many sites of action.
Evidence from data base searches and our Southern hybridization results (Fig. 8) suggest that OGT is highly conserved among eukaryotic species. Interestingly, no clear-cut structural homolog emerged from an examination of the Saccharomyces cerevisae genome. Three genes emerged as having some structural similarities to OGT. These proteins are TPR proteins involved in cell cycle control and transcription: Cdc16, Cdc27, and SSN6. SSN6 has very similar TPR repeats to those of OGT and exhibits 42% similarity and 20% identity to both human and C. elegans OGT throughout its entire length. SSN6 is of particular interest since it is known to be involved in transcriptional repression in response to glucose; this is one of the proposed roles of OGT in mammalian cells (see below).
OGT Contains a Bipartite Nuclear Localization Sequence Which Can Function to Direct Nuclear Localization-Both the human and C. elegans OGTs have putative nuclear localization sequences immediately after the last TPR motif, although the human sequence has a single amino acid change rendering it less prototypic (Fig. 4A). Indirect immunofluorescence of the native enzyme in C. elegans embryos (Fig. 6) shows a nuclear and punctate perinuclear staining. Overexpression of the recombinant enzyme in C. elegans transgenic lines using heat shock promoters shows an accumulation of the enzyme in the nucleus and cytoplasmic aggregates. The transgenic lines expressed OGT in both the gut and neural tissues where the heat shock promoters preferentially drive expression of exogenous genes. It is likely that both the nuclear localization sequence and TPR repeats play a role in maintaining the steady-state localization of OGT observed in C. elegans transgenic lines.
Overexpression of the OGT cDNA Is Sufficient to Increase Enzyme Activity in Human Cells-Our studies suggest that expression of human OGT cDNA is sufficient to elevate the levels of OGT activity in a transfected HeLa cell population. Under conditions in which approximately 10 -20% of the cells were expressing the exogenous cDNA, the enzyme activity was elevated 2-3-fold. Overexpression of the enzyme was apparently toxic, since expressing cells showed a much reduced growth rate compared with controls. The protein is also likely to interact with cellular proteins, which could alter its activity or modify it. Post-translational modification of the OGT, possibly by phosphorylation, could alter the level of glycosylation of cytoplasmic and nuclear proteins.
Although Ubiquitously Expressed in Adult Human Tissues, OGT Is Enriched in the Pancreas-The Northern analysis distinguishes four distinct OGT transcripts at 9.3, 7.9, 6.3, and 4.4 kb (Fig. 9). The signal in the pancreas is over 12-fold higher than seen in the lung and kidney. There appears to be a tissue-specific distribution of these different bands. The largest signals at 9.3 and 7.9 kb are most abundant in the pancreas and placenta, while the 6.3-kb transcript is the major signal seen in the other tissues. It is not known at this time if the multiple transcripts represent the transcription of different genes or alternative splicing and processing of the same gene. The large size of the mRNA transcripts compared with the . Cells were transfected with plasmid containing the human transferase clone pECE-Lv4F or with the control plasmid alone pECE, harvested at 24 h, and assayed for O-linked GlcNAc transferase activity as described under "Experimental Procedures." The data are expressed in terms of the fold enrichment observed in OGT-specific activity relative to untransfected HeLa cells. The specific activity of HeLa cells was approximately 900 dpm/g of protein.
FIG. 8. Hybridization of human liver O-GlcNAc transferase to genomic DNA from various species. Genomic DNA (3 g/lane) was digested with EcoRI, separated by electrophoresis on 0.7% agarose gel, and transferred onto nylon membranes. The blot was probed with radiolabeled full-length human liver clone Lv4F and exposed to Kodak Bio-Max MR film at Ϫ70°C for 3 days. The location of 1-kb ladder standards is shown to the left.
FIG. 9. Northern blot analysis of human tissues. Poly(A) RNA (2 g) from a variety of adult human tissues were probed with radiolabeled full-length human liver clone Lv4F (GlcNAc-T Probe Lv4F) and exposed to Kodak Bio-Max MR film at Ϫ70°C for 3 days. The blot was stripped according to the manufacturer's protocol and rescreened with a human ␤-actin probe (␤-Actin Probe). The location of standards (kb) is shown to the left.
isolated clones and open reading frame of the gene presumably corresponds to extensive 5Ј and 3Ј untranslated sequences. This has been observed for a number of glycosyltransferases (22). The role of these large regions of untranslated mRNA is not known, but it may be important in regulation of these genes. The human clones identified here also show variation in the polyadenylation signal, which could partially explain the different size of the messages.
The Hexosamine Biosynthetic Pathway and Glucose Metabolism-The hexosamine biosynthetic pathway is responsible for the synthesis of cytoplasmic UDP-GlcNAc utilized by OGT. Normally 2-3% of incoming glucose fluxes through this pathway (23). Increased glucose flux through the hexosamine biosynthetic pathway, caused by hyperglycemia, has been shown to mediate insulin resistance (23)(24)(25)(26)(27)(28). The hexosamine biosynthetic pathway, by controlling intracellular UDP-GlcNAc concentrations may be acting in peripheral tissues as a glucose sensor, which is reflected in substrate-driven O-linked GlcNAc modification of intracellular proteins by OGT. Glucosamine administration has been shown to impair insulin secretion from the pancreas in response to glucose both in vitro and in vivo (29). We found OGT to be highly abundant in the pancreas, further suggesting a possible role in insulin secretion and glucose homeostasis.

Similarity between Phosphorylation Sites and Sites of O-Linked GlcNAc Addition-O-Linked
GlcNAc modifies many phosphoproteins, which are components of multimeric complexes. The sites modified by O-linked GlcNAc often resemble phosphorylation sites, leading to the suggestion that the modifications may compete for substrate in these polypeptides (9). In general, the sites modified by OGT very closely resemble those of the glycogen synthase kinases (GSK-3, casein kinase II) and mitogen-activated protein kinase. Interestingly, insulin activates the mitogen-activated protein kinase cascade, inhibiting GSK-3 inhibition of glycogen synthase, the rate-limiting enzyme in glycogen synthesis (30 -31). GSK-3 also modifies the oncogene c-jun and negatively regulates its transactivating potential in vivo. Another oncogene, c-myc, is modified by both O-linked GlcNAc and phosphorylated by GSK-3 in a domain required for transcriptional activation (31)(32)(33). Glucose-responsive elements from several mammalian genes have been identified and include myc-like response elements (34). Therefore, O-linked GlcNAc addition and phosphorylation, by kinases such as GSK-3, may have as a common denominator their involvement in transcriptional regulation of glucose metabolism.