The Making of a Sweet Modification: Structure and Function of O-GlcNAc Transferase*

O-GlcNAc transferase is an essential mammalian enzyme responsible for transferring a single GlcNAc moiety from UDP-GlcNAc to specific serine/threonine residues of hundreds of nuclear and cytoplasmic proteins. This modification is dynamic and has been implicated in numerous signaling pathways. An unexpected second function for O-GlcNAc transferase as a protease involved in cleaving the epigenetic regulator HCF-1 has also been reported. Recent structural and biochemical studies that provide insight into the mechanism of glycosylation and HCF-1 cleavage will be described, with outstanding questions highlighted.

There are three physiological isoforms of OGT, differing primarily in the number of N-terminal TPRs (32)(33)(34). The longest isoform, with 13.5 TPRs, is referred to as nuclear and cytoplasmic OGT (ncOGT). A shorter form, with 9 TPRs and bearing a mitochondrial targeting sequence, is localized to the inner mitochondrial membrane (mOGT). The smallest of the three, sOGT, has only 2.5 TPRs and is perhaps the least studied. Interestingly, mOGT appears to have no in vivo catalytic function (33), but instead has been speculated to be involved, along with sOGT, in apoptosis (35,36).
The first structure of a segment of OGT was reported in 2004. This structure comprised 11.5 TPRs from the N terminus of the human protein and showed that the TPRs assemble into an elongated superhelix ( Fig. 2A) (37). Several features of the TPR structure are noteworthy. In the asymmetric unit, two TPR molecules pack as a homodimer, with the interface centered on TPRs 6 and 7 (37). Previous studies had suggested that OGT exists as an oligomer in solution, and the TPRs were implicated in mediating its oligomerization (27,38). The TPR crystal structure suggested that OGT may exist as a dimer, and consistent with this, mutating the residues at the TPR dimer interface afforded a lower order species. Because mutations that disrupted this interface did not substantially affect enzymatic activity in vitro, it remains unclear what role oligomerization serves for OGT function. The TPR structure also suggested a high degree of conformational flexibility, with one monomer bent by ϳ40°between TPRs 9 and 10 (37). Although this deformation could be due to crystal-induced unfolding, the authors suggested that the conformational flexibility might have implications for substrate recognition. Finally, phylogenetic comparisons showed strong conservation in solvent-exposed residues lining the interior of the TPRs, particularly several asparagine residues near the center of the cavity, and suggested an unexpected similarity to armadillo repeat-containing proteins, such as importin-␣ and ␤-catenin (37).

FIGURE 2.
A, a single chain from the 11.5-TPR structure. The C and N termini are labeled, as is the position of the first TPR. B, structure of hOGT 4.5 . The complex shown contains UDP (purple spheres) and a peptide derived from a well studied OGT substrate (orange balls and sticks), CKII (27). The intervening domain (green), two catalytic lobes of the GT-B fold (blue and red), and the TPRs (gray) are shown in ribbon representation. C, schematic representation of OGT using the same coloring as described for B. D, model of ncOGT constructed from the 11.5-TPR structure and the hOGT 4.5 structure. The dark-gray TPRs are derived from the hOGT 4.5 structure, whereas the light-gray TPRs are from the 11.5-TPR structure. C-Cat and N-Cat, C-and N-terminal catalytic domains, respectively. solved with UDP, UDP-GlcNAc, the hydrolysis-resistant analog UDP-2-acetamido-2-deoxy-5-thio-D-glucopyranose (UDP-5SGlcNAc) (42), and several different peptide and glycopeptide substrates bound. Hence, binary as well as ternary substrate-like and product complexes are available (4,11,(43)(44)(45). A structure containing a bisubstrate analog has also been reported (46). Taken together, these structures offer perhaps the most complete structural picture yet obtained for any glycosyltransferase. By combining the TPR structure with the catalytic domain structure, a model for human ncOGT, containing all 13.5 TPRs, can be created (Fig. 2D). In this model the TPRs form two complete superhelical turns and extend upward from the active site.
The structures of OGT have revealed many important features. As expected, the catalytic domain of OGT possesses a GT-B fold, reminiscent of that found in MurG (CAZy glycosyltransferase family 28), GtfB (CAZy glycosyltransferase family 1), and many other glycosyltransferases (47)(48)(49)(50)(51)(52). GT-B glycosyltransferases possess a catalytic domain comprising two lobes that each adopt a Rossmann-like fold. OGT differs from other GT-B glycosyltransferases in several key respects, however. The N-terminal catalytic domain bears two additional helices that are crucial components of the active site (4). Moreover, it possesses an ϳ120-amino acid insertion (termed the intervening domain (Int-D)) between its two catalytic lobes (4). This insertion forms a domain that packs exclusively against the C-terminal catalytic domain. Although its structure is now known (Fig.  2B), its function remains mysterious.
The peptide substrate binds over the UDP-GlcNAc, and the peptide-binding site spans a groove formed between the two catalytic lobes and the TPR domain (Fig. 2B). The extended conformation of the peptide substrate likely explains the preference for proline and ␤-branched amino acids near the site of glycosylation (4). There is an interaction between the ␣-phosphate of the nucleotide sugar and the amide N-H of the acceptor serine/threonine that is proposed to play a role in peptide binding to OGT (4) and may further serve to orient the substrate for reaction ( Fig. 3A) (43). Although several OGT residues contact the backbone of the peptide substrate, few make specific contacts with the peptide side chains, consistent with the lack of a clear consensus sequence (4). Peptide binding is accompanied by widening of the cleft between the catalytic domain and the TPRs, and molecular dynamics simulations have suggested that the TPRs are able to pivot about a hinge region between TPRs 12 and 13, giving rise to open and closed states for the active site (4). Such molecular motion would be crucial for accommodating larger substrates, such as loops in folded proteins. Moreover, the flexibility is consistent with that seen in the TPR crystal structure and with the previous suggestion that the TPRs mediate substrate binding (37).

Mechanism of Protein Glycosylation
Although it was initially proposed that OGT follows a random Bi Bi kinetic mechanism (27), structural information showed that the polypeptide substrate bound above the UDPsugar, which suggested an ordered mechanism in which UDP-GlcNAc binds first. Product inhibition studies also supported an ordered Bi Bi mechanism (4). Hence, UDP-GlcNAc binds first to OGT, followed by the polypeptide substrate, which makes extensive contacts with the sugar donor (Fig. 3A). Notably, kinetic studies have shown that the K m(app) values for UDP-GlcNAc can differ by up to an order of magnitude depending on the polypeptide substrate examined, which underscores the importance of substrate-substrate interactions in the mechanism (27,53). After the ternary substrate complex forms, the acceptor side chain attacks the anomeric carbon, with loss of UDP, to form a ␤-glycosidic linkage. Finally, the glycopeptide dissociates, followed by UDP.
In addition to binding the sugar donor and acceptor substrates in close proximity (in the case of OGT, UDP-GlcNAc and a polypeptide, respectively), it has long been presumed that glycosyltransferases must do two things to catalyze glycosylation: 1) activate the leaving group of the glycosyl donor toward departure and 2) remove the proton from the hydroxyl group on the glycosyl acceptor (48). GT-A superfamily glycosyltransferases use a metal ion to activate the leaving group, but OGT and other GT-B glycosyltransferases are metal ion-independent enzymes and use a different strategy ( Fig. 3B) (48).
Leaving group activation in OGT is achieved through several elements that stabilize the buildup of negative charge. First, the ␤-phosphate of UDP-GlcNAc is anchored by hydrogen bonds to the N terminus of an ␣-helix; these include key interactions with His-920, Thr-921, and Thr-922. Further stabilization from this helix is derived from electrostatic interactions between the negatively charged phosphate and the net helix dipole (43). Other GT-B superfamily members use a similar mode of activation (48), with some anchoring the ␣-phosphate, rather than the ␤-phosphate, of the leaving group (54,55). Second, the side chain of a lysine residue (Lys-842) is positioned directly below the ␤-phosphate (Fig. 3C) and presumably further activates the leaving group toward departure. Lys-842 plays an essential role in the enzymatic activity of OGT (4,40,43,44). Because Lys-842 is chemically reactive, as evidenced by its nucleophilicity toward a covalent OGT inhibitor (45), its pK a may be suppressed due to its proximity to the aforementioned helix dipole; however, its protonation state remains to be established explicitly (45). This combination of a helix dipole and carefully positioned hydrogen bonds from the N terminus of the helix and a critical lysine to the ␤-phosphate activates UDP-GlcNAc toward nucleophilic attack. Nevertheless, glycosylation is favored over hydrolysis, raising the question of how selectivity for reaction with an acceptor hydroxyl instead of water is achieved, especially given that several ordered water molecules are visible within the active site of all structures below 2.0 Å resolution (Fig. 3C). It seems likely that the aforementioned contact between the ␣-phosphate of UDP-GlcNAc and the amide N-H of the acceptor serine (Fig. 3B) is an important element for this selectivity. This contact may serve to bring the acceptor and donor into close proximity and position the acceptor hydroxyl in a more favorable position to form a bond with the anomeric carbon (labeled C1 in Fig. 3D) compared with any water molecule (Fig. 3C) (43). It may also further activate the leaving group for departure.
When substrate and product ternary complexes are examined, modest changes in several positions can be seen, with the C1, C2, and O5 atoms of the GlcNAc displaying the greatest displacement (Fig. 3D). A net upward rotation of the sugar moves the anomeric carbon away from the ␤-phosphate of UDP and into bonding distance with the acceptor hydroxyl. The inferred reaction trajectory is consistent with an electrophilic migration-type mechanism (43). Another notable change is the rotation of the C2 acetamide, which contacts His-498 and rotates away from C1 along the reaction coordinate (Fig. 3D). The importance of the C2 acetamide was further demonstrated by the fact that OGT can transfer UDP-GalNAc, but not UDPglucose or UDP-2-keto-Glc, a UDP-GlcNAc analog in which the C2 N-H is replaced with CH 2 (43).
In addition to providing information about leaving group stabilization and reaction trajectory, the OGT ternary structures revealed a surprise: there is no OGT side chain that can serve as a catalytic base in the immediate vicinity of the reactants (43,44). Two different side chains, His-498 and His-558, had previously been suggested as candidates for the catalytic base, but both were ruled out after the examination of ternary structures containing substrate analogs implicated them in other roles (43,44). Two alternatives for how proton transport is achieved have been suggested (Fig. 3E). In one, the ␣-phosphate of UDP is proposed to act as a base to deprotonate the acceptor hydroxyl (Fig. 3E, Proposal I) (44). In the other, a chain of water molecules leading to an aspartate side chain (Asp-554) is proposed to facilitate transport of the proton liberated during the reaction from the active site (Fig. 3E, Proposal II)   hypothesis, both diastereomeric phosphodithioate analogs of UDP-GlcNAc were prepared, and it was found that although both had a similar binding affinity for OGT, only one of them functioned as a donor substrate (44). Although this finding suggests that the ␣-phosphate is involved in the catalytic mechanism, results involving substrate mimics must be examined cautiously. Replacing the acceptor serine with an aminoalanine may, depending on the protonation state of the amine, bring the amino group in closer proximity to the ␣-phosphate than it would be in the native substrate. Furthermore, oxygen-to-sulfur substitution on the phosphate would likely affect the previously mentioned hydrogen bond to the backbone of the acceptor amide (Fig. 3, C and D). Given that the importance of this interaction has not been rigorously examined, its energetic contribution to binding should be quantified. The proposal that the proton is shuttled out of the active site by a Grotthuss-like (56, 57) mechanism invokes the active-site ordered water molecules in the mechanism in lieu of implicating a catalytic base within OGT. In a transition state involving an oxocarbenium ion-like species, as computational studies have suggested for OGT (58), a general base may not be needed to deprotonate the acceptor hydroxyl for the reaction to occur. In this case, the ordered water molecules leading to Asp-554 may simply provide a pathway for proton extrusion from the active site. It has been shown that mutation of Asp-554 4 to alanine abolishes catalytic activity (59), but mutation to asparagine does not (44). Further study is required to resolve the issue of how the proton on the acceptor hydroxyl is removed.

HCF-1 Is Cleaved by OGT
In 2011, OGT was implicated as the cellular factor responsible for the maturation of an essential transcriptional co-activator, HCF-1 (9, 10). HCF-1 was first identified in the late 1980s as a component from HeLa cell lysate required for the stable formation of the herpes simplex VP16-induced complex and the expression of immediate-early genes during herpes virus infection (60 -65). HCF-1 has since been implicated as a critical element in cell cycle regulation, including both G 1 -to-S and M phase progression (66 -73). Biochemically, the first report that described the purification of HCF-1 (then called C1), by Kristie and Sharpe (74), suggested its existence as a set of heterogeneous, but related, polypeptide fragments; later work revealed that these polypeptide fragments were derived from a large precursor protein by some proteolytic maturation process (75). Although the protease responsible for this cleavage remained elusive, pulse-chase studies suggested that cleavage took place largely in the nucleus, but also, albeit to a lesser extent, in the cytoplasm (76). Cleavage appeared to take place at one or more of six 26-amino acid repeating elements, which constituted an atypically large protease recognition motif (75,76). Based on N-terminal peptide sequencing, cleavage was reported to occur between Glu-10 and Thr-11 of a given repeat (76,77). One striking feature of this cleavage was the fact that even though it was shown to be required for proper HCF-1 function (78), the two halves of the protein remain noncovalently associated post-processing (76,79,80). Although several mechanisms for cleavage were considered (9,10,81), the finding that HCF-1 interacts stably with and is modified by OGT (9,67,82,83) led to the suggestion that OGT may somehow promote cleavage. This raised several questions. Was there a second OGT active site responsible for this new transformation? What was the mechanism of the transformation? Why would nature connect OGT activity with the cleavage of HCF-1?
The most recent studies into HCF-1 proteolysis exploited a combination of structural and biochemical tools to confirm that OGT indeed catalyzes cleavage, and they have provided additional insights into this intriguing process (11). A cleavage substrate devoid of Ser/Thr glycosylation sites was developed, and reconstitution of the cleavage reaction established that UDP-GlcNAc is required for HCF-1 cleavage, whereas UDP or UDP-5SGlcNAc inhibits cleavage (11). Inactivation of OGT, through either a K842A mutation or modification with a covalent OGT inhibitor (45), also blocked cleavage (11). The finding that UDP-5SGlcNAc, which is resistant to glycosylation and hydrolysis (42,43), is unable to promote cleavage, even though it binds identically to UDP-GlcNAc (43), suggested that the nucleotide sugar must serve a functional role rather than simply a structural one.
Mass spectrometry established that cleavage in fact occurs between Cys-9 and Glu-10, rather than between Glu-10 and Thr-11 (Fig. 4A) (11), and results in formation of a C-terminal cleavage product whose N terminus has cyclized to form a pyroglutamate residue (11). The formation of this product would require activation of the side chain carboxylate of the glutamate. Although there are several possible mechanisms by which this could occur, structural studies have established that the cleavage region of the HCF-1 proteolytic repeat binds in the active site in a manner almost identical to that of the casein kinase II (CKII) glycosylation substrate and with the glutamate side chain (Glu-10) in the same position as the reactive serine of CKII (Fig. 4A) (11). Indeed, converting the glutamate residue to serine transforms a cleavage substrate into a glycosylation substrate (11). These findings implied that cleavage occurs in the same active site of OGT as glycosylation and suggested that the glutamate carboxyl group might react with the UDP-GlcNAc substrate to form a glutamyl-GlcNAc intermediate (Fig. 4B). Although enzymatic glycosylation on glutamate has not, to our knowledge, been reported, there is precedent for the spontaneous hydrolysis of internal pyroglutamates that form by other mechanisms (84 -86). Additional studies will be required to elucidate details of the catalytic mechanism of HCF-1 cleavage by OGT. There are outstanding questions about the possible roles of the N-terminal pyroglutamate in the biological function of HCF-1 and whether some of the effects of OGT that have been attributed to its glycosylation function are in fact due to its role in HCF-1 cleavage. However, as has been suggested (9,11), there is a unifying theme: both glycosylation and cleavage depend on UDP-GlcNAc concentration, and cleavage may thus be another mechanism for linking cell cycle regulation to the metabolic status of the cell, as for glycosylation.

The TPR Domain Plays a Role in Substrate Recognition
Although the TPR domain of OGT has been implicated in protein-protein interactions (see above) and is presumed to play a key role in substrate selection, there is little information on how protein substrates or adaptor/enhancer proteins are recognized by this domain. Structural studies involving the HCF-1 proteolytic repeat bound to OGT provided the first information about how the TPRs of OGT recognize at least some of its substrates. In addition to the portion of an HCF-1 repeat where cleavage takes place, there is a C-terminal threonine-rich region that was implicated in binding to the TPRs of OGT. Recent crystal structures revealed how the threoninerich region binds to OGT (Fig. 4, C and D) (11). The structures show an elaborate network of hydrogen bonds that envelop the threonine-rich C terminus of the HCF-1 repeat within the TPRs of OGT (Fig. 4C) (11). Within this region, HCF-1 is bound in an extended conformation, with the highly conserved ladder of OGT asparagine residues (see above) engaging the peptide, mostly through interactions with the amide backbone. By engaging the peptide backbone, this asparagine network positions the side chains, which, in this case, are largely threonines, to make additional hydrogen bonds with polar residues, mostly aspartates, that line the interior of the TPR cavity. This likely provides sequence selectivity, as large or hydrophobic residues would be disfavored. This structure confirms that there is indeed an analogy between the substrate-binding mode of OGT and that of some other proteins that recognize multiple related ligands (see above) (37). For example, ␤-catenin, which binds to various cadherins (87,88); PEX5, which recognizes a peroxisomal targeting signal (PTS1) (89); karyopherin (importin)-␣, which recognizes a nuclear localization sequence (88); and the histocompatibility protein HLA-DR1, which recognizes pathogen-derived peptides (90) all employ asparagine-mediated peptide backbone contacts as a crucial component of their ligand binding. There are two major reasons for this: backbone-binding interactions provide sequence-independent binding energy and predictably splay the side chains of the ligand (Fig. 4C), which in turn allows for sequence-selective recognition at defined sites along the polypeptide chain. In the context of OGT, this might imply that a consensus sequence for adaptor proteins or for substrates might in fact exist and remains to be discovered. As such, the interaction between HCF-1 and OGT has illuminated one mechanism for substrate engagement, but whether this generalizes to other substrates remains to be seen. . A, overlay of the ternary complexes HCF-1⅐UDP-5SGlcNAc⅐hOGT 4.5 and OGT⅐UDP-5SGlcNAc⅐CKII. HCF-1 (yellow sticks) resides above UDP-5SGlcNAc (gray sticks). The CKII (tan sticks) and HCF-1 peptides overlay almost perfectly. B, overall cleavage reaction catalyzed by OGT. A proposed glutamyl-GlcNAc intermediate is shown. C, the C terminus of the HCF-1 peptide (yellow sticks) makes extensive contacts within the TPRs of OGT (gray ribbon). Glu-13 and Thr-24 (indicating the relative positions of the N and C termini, respectively) are indicated in boldface italics. Hydrogen bonds are shown as black dotted lines. Important OGT residues (gray sticks) are labeled according to ncOGT numbering. Contacts from two lysine residues (gray sticks) were omitted for clarity. D, overall structure of the ternary complex containing HCF-1 (yellow balls and sticks), UDP-5SGlcNAc (magenta spheres), and hOGT 4.5 (gray ribbon).

Conclusions
The past few years have resulted in an enormous amount of progress in the structure and mechanism of OGT. Although these insights stand to benefit the understanding of other GT-B superfamily members, from a structural and biochemical perspective, there are still several unresolved questions regarding OGT. What is the most likely mechanism for shuttling a proton from the active site? What role does oligomerization play in OGT function? What is the function of the intervening domain? OGT has been shown to bind to phosphatidylinositol 3,4,5-triphosphate (PIP 3 ) (91,92), and mutagenesis suggested that Lys-981 and Lys-982, both residing in the C-terminal catalytic domain, were responsible for PIP 3 binding. However, it has been reported that the K981A/K982A double mutant binds PIP 3 as well as other PIPs comparably to wild-type ncOGT (4). Furthermore, several combinations of lysine residues in the C-terminal catalytic domain or Int-D were mutated, but none of these mutations seemed to abrogate PIP binding. Due to the ambiguity of these findings and given the highly basic nature of the Int-D, which might suggest a role in mediating interactions with anionic macromolecules or membrane interfaces, further investigations into the function of this domain and the location of the PIP-binding site are certainly warranted. How does OGT interact with larger polypeptides, such as folded proteins? Numerous proteins have been shown to interact with OGT directly or indirectly, although little, if anything, is known about the physical picture of these interactions. Of the characterized interactions with OGT, some are mediated by the TPRs (see above), whereas others seem to interact elsewhere (23,28). Furthermore, TPRs in different systems have been shown to interact with proteins in several different ways (18). HCF-1 provided the first picture of a substrate engaging the TPRs of OGT, but it remains to be seen whether other substrates, adaptors, or enhancers interact with OGT analogously. Several recent publications have exploited protein microarrays as a tool that allows for precise systematic control of individual parameters, which is likely to be crucial in examining the role of possible adaptors/enhancers (93)(94)(95). More biochemical and structural data on how OGT physically interacts with different proteins would add greatly to understanding its function.