Structural Insights into the Glycosyltransferase Activity of the Actinobacillus pleuropneumoniae HMW1C-like Protein*

Glycosylation of proteins is a fundamental process that influences protein function. The Haemophilus influenzae HMW1 adhesin is an N-linked glycoprotein that mediates adherence to respiratory epithelium, an essential early step in the pathogenesis of H. influenzae disease. HMW1 is glycosylated by HMW1C, a novel glycosyltransferase in the GT41 family that creates N-glycosidic linkages with glucose and galactose at asparagine residues and di-glucose linkages at sites of glucose modification. Here we report the crystal structure of Actinobacillus pleuropneumoniae HMW1C (ApHMW1C), a functional homolog of HMW1C. The structure of ApHMW1C contains an N-terminal all α-domain (AAD) fold and a C-terminal GT-B fold with two Rossmann-like domains and lacks the tetratricopeptide repeat fold characteristic of the GT41 family. The GT-B fold harbors the binding site for UDP-hexose, and the interface of the AAD fold and the GT-B fold forms a unique groove with potential to accommodate the acceptor protein. Structure-based functional analyses demonstrated that the HMW1C protein shares the same structure as ApHMW1C and provided insights into the unique bi-functional activity of HMW1C and ApHMW1C, suggesting an explanation for the similarities and differences of the HMW1C-like proteins compared with other GT41 family members.

Glycosylation of proteins is an essential process that plays an important role in protein structure and function. In recent years, glycoproteins have been identified increasingly in prokaryotes, including pathogenic bacteria. Some bacterial species contain complex O-and N-glycosylation pathways encoded by multiple gene clusters (1,2), and others utilize a glycosyltransferase alone to modify a single target protein. Most bacterial glycoproteins are surface exposed, suggesting that glycosylation may influence interactions with the host (2)(3)(4)(5).
Non-encapsulated Haemophilus influenzae is a common cause of localized respiratory tract disease in humans and initiates infection by colonizing the upper respiratory tract (6). Approximately 80% of non-encapsulated H. influenzae clinical isolates express two related high molecular weight proteins called HMW1 and HMW2 that mediate high level adherence to respiratory epithelial cells, a critical step in the pathogenesis of H. influenzae disease (7). HMW1 and HMW2 are encoded by homologous genes designated hmw1A and hmw2A, respectively. The hmw1A gene is flanked by the hmw1B and hmw1C accessory genes, and the hmw2A gene is flanked by the hmw2B and hmw2C accessory genes. The hmw1B and hmw2B genes and the hmw1C and hmw2C genes are highly homologous (8,9).
The HMW1 adhesin is synthesized as a pre-pro-protein that contains an atypical signal peptide (amino acids 1-68), an adjacent pro-piece (amino acids 69 -441), and a large exoprotein domain with adhesive activity (amino acids 442-1536) (10 -12). HMW1 undergoes an elaborate maturation process and is ultimately presented on the bacterial surface via the two-partner secretion (TPS) 2 pathway, a common secretion pathway in Gram-negative bacteria (12)(13). In general, TPS systems consist of a large exoprotein (TpsA) and a cognate outer membrane channel-forming translocator protein (TpsB). In the HMW1 system, HMW1 is the TpsA protein, and HMW1B is the cognate TpsB protein. The HMW1 secretion system is characteristic of a subset of TPS systems and requires an additional protein for secretion called HMW1C (14 -16).
Recent work established that the HMW1 adhesin is an N-linked glycoprotein and is modified at over 30 asparagine residues, in all except one case in the conventional consensus sequon of N-glycosylation, Asn-X-Ser/Thr (17). Glycosylation plays an essential role in preventing premature degradation of HMW1 during the process of secretion and in promoting tethering of HMW1 to the cell surface, a prerequisite for HMW1mediated adherence (4). The glycan structures that modify HMW1 are simple mono-hexose or di-hexose sugars containing glucose or galactose rather than the characteristic N-acetylated sugars of N-glycosylation (supplemental Fig. S1), suggesting the presence of a novel glycosyltransferase (17).
The HMW1C protein is located in the cytoplasm and is the enzyme responsible for glycosylating HMW1 (4,18). In recent work, we demonstrated that an Actinobacillus pleuropneumoniae HMW1C ortholog called ApHMW1C is an N-glycosyltransferase capable of transferring glucose and galactose to known asparagine glycosylation sites in HMW1, analogous to HMW1C (19). In addition, both ApHMW1C and HMW1C are capable of creating glucose-glucose linkages in in vitro reactions to account for the di-hexose modification of HMW1 (18,19). It is not known whether the di-hexose is formed prior to modification of the acceptor asparagine residue or whether instead a single glucose is linked to the target asparagine and then a second glucose is linked to the first glucose, although the conventional interpretation is that the hexose is added to the protein and then the chain is extended (18). These observations indicate that HMW1C-like proteins are uniquely versatile, harboring N-glycosyltransferase activity that mediates N-linkage to the acceptor protein and O-glycosyltransferase activity that creates di-glucose structures on the acceptor protein. The CAZy (Carbohydrate Active Enzymes database at www.cazy-.org) database currently classifies HMW1C-like proteins as members of the GT41 family, a family that otherwise exclusively contains O-GlcNAc transferases (OGT) with characteristic tetratricopeptide repeats (TPR) at the N terminus (20 -23).
In the current study, we set out to elucidate the structure of HMW1C-like proteins and to define the structural and functional differences between HMW1C-like proteins and the OGT members of the GT41 family. Recombinant HMW1C was insoluble in high concentrations and was thus refractory to crystallization. As an alternative, we turned to ApHMW1C, the closest homolog of HMW1C. Here we report the crystal structure of ApHMW1C and structure-function studies of HMW1C, providing fundamental insights into HMW1C-like proteins and expanding our understanding of the GT41 family of glycosyltransferases.

EXPERIMENTAL PROCEDURES
General Materials-Restriction enzymes, Pfu DNA polymerase, and T4 DNA ligase were purchased from New England Biolabs, Stratagene, and Promega, respectively. Primers used for PCR were synthesized by IDT. The peptides (Ͼ95% purity) were synthesized by Genscript (Piscataway, NJ). Unless indicated otherwise, chemicals were purchased from Sigma.
Cloning, Expression, and Purification-Methods used for the cloning, expression, and purification of ApHMW1C and HMW1C have been described previously (18 -19). Mutations in ApHMW1C and HMW1C were generated using the QuikChange II site-directed mutagenesis kit (Stratagene) and a mutagenic primer set according to the manufacturer's instruction. The plasmid pHMW1-15 encoding HMW1, HMW1B, and HMW1C served as the template for mutations in HMW1C (11).
Protein Crystallization-The purified ApHMW1C protein was concentrated to 9 mg/ml in buffer containing 50 mM HEPES pH 7.0, 0.2 M NaCl, 5% glycerol and 0.1 M EDTA. The solution of peptide pN1131 (NVTVNNNITSHK, corresponding to residues 1131 to 1142 of HMW1) was prepared to a concentration of 10 mM peptide in sterilized water. To obtain ApHMW1C-peptide complex crystals, ApHMW1C and pN1131 solutions (10:1 (v/v)) were mixed and incubated at room temperature for 2 h. The protein-peptide sample was screened against commercial screen solutions (Hampton Research Inc). Small three-dimensional crystals were observed in drops containing PEG8000. Crystallization was optimized by employing microseeding methods. Sizable crystals (in space group P2 1 2 1 2 1 ) of protein-peptide complex were grown in a mixture of 1.5 l of protein-peptide solution, 0.3 l of seed solution, and 1.5 l of reservoir solution (0.1 M MES pH 6.5, 0.12-0.16 M (NH 4 ) 2 SO 4 , and 22-30% PEG 8000) using the hanging drop vapor diffusion method at 17°C for a week.
To obtain native apo crystals, microseeding methods were also performed, since the ApHMW1C solution alone (i.e. in the absence of pN1131) only produced P1 crystals. The initial apo crystals were obtained in similar crystallization conditions to protein-peptide complex crystals using a fresh seed stock prepared from the pN1131::ApHMW1C complex crystals. Subsequently, sizable apo crystals used for data collection were grown in the same conditions, but the seed stock was made from the apo crystals. To obtain UDP-Glc::ApHMW1C complex crystals, the 0.5 l of 10 mM UDP-Glc solution was directly added into the crystallization drops of apo ApHMW1C crystals and then incubated at 17°C for 2 h.
Heavy Atom Labeling and Data Collection-For phasing, five different heavy atom compounds, 0.5 and 1 mM HgCl 2 , 1, 5, and 10 mM EMTS (ethyl mercury thiosalicylate)/Thimerosal, 1, 5, and 10 mM K 2 PtCl 4 , 1, 5, and 10 mM KAu(CN) 2 and 1 mM (CH 3 )Pb(CH 3 COO) 2 , were prepared in mother liquor solution (0.1 M MES pH 6.5, 0.12 M (NH 4 ) 2 SO 4 and 23% PEG 8000). Heavy atom derivatives were prepared by soaking apo ApHMW1C crystals in each of these solutions at 17°C for 3, 6, and 24 h. After each time point, crystals were washed by "backsoaking" in mother liquor solution lacking heavy atom compounds. Crystals were cryo-protected with mother liquor solution containing 25% (v/v) glycerol and cooled in liquid N 2 . Similarly, native apo ApHMW1C crystals, UDP-Glc:: ApHMW1C complex crystals, and pN1131::ApHMW1C peptide complex crystals were treated with cryoprotectants and flash cooled in liquid N 2 . Diffraction data were collected at 1.008 Å for the Hg SAD datasets and at 0.9794 Å for crystals of native apo ApHMW1C, UDP-Glc::ApHMW1C complex, and pN1131::ApHMW1C peptide complex on beamline 19ID at the Advanced Photon Source, Argonne National Laboratory, using an ADSC Quantum 315 CCD detector. All data sets were indexed and integrated with HKL2000 or HKL3000 and scaled with SCALEPACK (24). General handling of the scaled data were carried out with programs from the CCP4 suite.
Structure Determination-The structure was solved by SAD phasing from EMTS-derivatized crystals of apo ApHMW1C using Shelx, MLphare, and dm on HKL3000 (25). SHELXC/D was used to obtain eight mercury sites and the initial phases were determined by MLphare. After the density modification, the starting model was built using buccaneer on CCP4 suite. Subsequently, one of two molecules in the asymmetric unit was manually built using COOT (26). When ϳ90% main chain and ϳ70% side chain was traced, the structure model was used as a template for Molecular Replacement (MOLREP) against native apo crystal dataset. MOLREP found two molecules in the asymmetric unit with the Cc values 0.465 and 0.590 for the 1 st protomer and the 2 nd protomer, respectively. The final model of the apo-ApHMW1C structure was obtained after iterative cycles of manual model building with COOT and refinement with REFMAC5 (27). To solve ligand-complexed structures, one of two protomers from the final model of the apo structure was used as a template against UDP-Glc::ApHMW1C complex dataset and pN1131::ApHMW1C complex dataset. For each structure, modeling and refinement were performed with similar procedures used for the apo structure. In pN1131:: ApHMW1C complex refinement, individual coordinates and B-factors refinement and simulated annealing were simultaneously applied using PHENIX (28). In each structure, solvent molecules were assigned at positions, where the electron density peaks were found above 1.3 in the 2Fo-Fc map (above 3.0 in the Fo-Fc map) and hydrogen bonds were stereochemically reasonable. The final model of the apo structure contained residues 1-619 of chain A and residues 4 -118, 134 -413, and 419 -619 of chain B. The final model of UDP-Glc::ApHMW1C complex contained residues 1-619 of chain A and residues 4 -118, 134 -412, and 419 -619 of chain B. The final model of pN1131::ApHMW1C complex contained residues 1-619 of chain A and residues 6 -119, 134 -412, and 420 -619 of chain B. Validation of all three final models was carried out using the Protein Data Bank validation server. Except for Val-106 (both chain A and B), almost all residues are in the most favored regions on the Ramachandran plot. A summary of the data collection and refinement statistics is given in Table 1. Coordinates and structure factors have been deposited at the Protein Data Bank with PDB ID 3Q3E (apo ApHMW1C), 3Q3H (UDP-Glc::ApHMW1C complex), and 3Q3I (pN1131::ApHMW1C complex).
Glycosyltransferase Assay of ApHMW1C and Data Analysis-Glycosyltransferase activity of ApHMW1C was assessed as previously described (19,29). Experimental details are given in the supplemental information.
Glycosyltransferase Assay of HMW1C-In vitro glycosyltransferase assays were performed as described previously (18). Briefly, 1.5 g of purified HMW1C or mutant HMW1C, 1.5 g of purified HMW1 802-1406 , and 20 l of 50 mM UDP-␣-D-glucose (Calbiochem) were combined in a final volume of 150 l in 25 mM Tris pH 7.2, 150 mM NaCl. Samples were incubated for 60 min at room temperature, then further incubated at 4°C overnight, and then resolved on an SDS-PAGE gel. Protein was transferred to a nitrocellulose membrane, and glycosylation was detected using DIG Glycan reagents (Roche).
Adherence Assays-Adherence assays were performed with Chang epithelial cells (human conjunctiva; ATCC CCL 20.2) (Wong-Kilbourne derivative clone 1-5c-4) as described previously (30). Escherichia coli expressing HMW1, HMW1B, and either wild type or mutant HMW1C was prepared by inoculating LB broth containing ampicillin to select for pHMW1-15 or the relevant plasmid derivative and incubating overnight. Percent adherence was calculated by dividing the number of adherent colony-forming units by the number of inoculated colonyforming units. All strains were examined in triplicate, and assays were repeated three times.
Protein Analysis-Whole cell lysates were prepared by suspending bacterial pellets in 10 mM HEPES, pH 7.4 and sonicating to clarity. Proteins were resolved by SDS-PAGE using 7.5% polyacrylamide gels. Western blots were performed using guinea pig antiserum GP96 against the HMW1 protein or guinea pig antiserum 64 against HMW1C.

RESULTS
Structure Determination of ApHMW1C-In initial experiments we purified recombinant H. influenzae HMW1C for crystallography. However, purified HMW1C precipitated at high concentrations, precluding crystallization. As an alternative, we turned to A. pleuropneumoniae HMW1C (ApHMW1C), which is 65% identical and 85% similar to HMW1C and shares the same ability to glycosylate the H. influenzae HMW1 adhesin in vivo and in vitro (19). We focused first on native and selenomethionine-derivatized ApHMW1C proteins but generated only thin plate-shaped triclinic (P1) crystals with high mosaicity. As an alternative approach, we exploited knowledge of the carbohydrate modification sites of HMW1 (17) and synthesized peptides that are known to be glycopeptides. Co-crystallization of ApHMW1C with the NVT-VNNNITSHK peptide (referred to as pN1131) resulted in crystals in the space group P2 1 2 1 2 1 diffracting to 2.45 Å resolution. Using these crystals and seeding techniques, we obtained apo crystals and UDP-Glc containing crystals, which diffracted to 2.1 Å and 2.25 Å resolution, respectively. The ApHMW1C structure was solved by SAD phasing with an EMTS derivative of the apo crystals and refined for three different crystal systems: apo form, UDP-Glc::ApHMW1C (the glucose moiety was not observed in the electron density map), and pN1131::ApHMW1C (although the peptide was critical for crystallization, no electron density was observed for the peptide) ( Table 1). The two protomers in the asymmetric unit did not have significant molecular contacts, indicating that the functional enzyme is a monomer, consistent with previous biochemical studies (19).
Overall Structure of ApHMW1C-The ApHMW1C structure consists of three discrete domains, including an all ␣-helical domain (referred to as AAD) at the N terminus (residues 1 to 257) and two Rossmann-like domains that create a GT-B fold at the C terminus (residues 258 to 620) (Fig. 1). Together the AAD and the two Rossmann-like domains form a quasi equilateral (ϳ70 Å) triangle face. The AAD fold contains 13 consecutive ␣-helical motifs and appears to be conserved among HMW1C-like proteins, as demonstrated by the structurebased alignment of ApHMW1C and 3 other HMW1C-like proteins ( Fig. 2A). Although the AAD of ApHMW1C and TPR repeats adopt all ␣-structures, the fold of AAD differs from the fold of TPR repeats (Fig. 2, B and C), providing strong structural evidence that HMW1C-like proteins are distinct from other members of the GT41 family (20 -23).
The GT-B fold of ApHMW1C contains the GT-1 domain (residues 258 to 403), the GT-2 domain (residues 427 to 620), and an inter-domain region that connects GT-1 and GT-2 (residues 404 to 426) (Fig. 1, A-C). The GT-1 and GT-2 domains have a similar core structure (␤/␣/␤ folds) and form the UDPsugar binding site at their interface. The long helical tail that includes C-terminal helices ␣24, ␣25, and ␣26 (residues 578 -618) extends from the GT-2 domain back to the GT-1 domain and secures interdomain (GT-1 versus GT-2) contacts, similar to observations for most GT-B fold glycosyltransferases (reviewed in Refs. 31,32). The sequence alignment and secondary structure assignment of ApHMW1C and HMW1C (Fig. 1, A-C, and supplemental Figs. S2 and S3) highlights that these proteins are virtually identical structurally except for a disordered 30 residue N-terminal tail in HMW1C (DISORDER2 Disorder Prediction server (33)).
The two protomers of each crystal system superimposed with an rmsd of 0.56 Å in apo-ApHMW1C, 0.45 Å in UDP-Glc::ApHMW1C, and 0.51 Å in pN1131::ApHMW1C. Pair-wise comparison of the six protomers from these three individual crystal systems ranged from 0.23 Å (apo chain A versus UDP complex chain A) to 0.61 Å (apo chain B versus peptide complex chain A). Overall, the six protomers from these three crystal systems superimposed well, except for three highly flexible peptide segments (Fig. 1D). Two of these segments (residues 1-3 and residues 119 -133, connecting the loop between ␣6 and ␣7) correspond to the most variable sequence regions of the AAD ( Fig. 2A) and protrude from the triangle plane in the opposite direction. The third segment (residues 414 -418) belongs to the inter-domain, which does not contain secondary structure and appears to be highly flexible.
UDP-sugar Binding Pocket-The difference Fourier electron density maps calculated using data collected from apo and UDP-Glc soaked crystals revealed a clear density for the UDP moiety, which is almost completely buried in the interdomain cleft between GT-1 and GT-2 and makes extensive contact with residues of the GT-2 domain (Figs. 1B and 3A). Interestingly, UDP occupied slightly different conformations in the two molecules of the asymmetric unit (referred to as UDP-A and UDP-B), suggesting a possible mechanistic snapshot of the enzyme. No electron density was observed for the glucose moiety on either protomer, suggesting release from the UDP molecule. The UDP binding pocket is defined by the C-terminal ends of ␤9 (residues 437-441) and ␤10 (residues 468 -471), the loop between ␤11 and ␣20 (residues 495-497), and the N-terminal ends of ␣20 (residues 498 -501) and ␣21 (residues 519 -522) (Figs. 3A and 5A).
The uracil bases on both UDP conformations are stabilized by a stacking interaction with the side chain of Tyr-501 and further van der Waals contacts with Gly-468. The N3 and O4 atoms on both uracil bases make hydrogen bonds with Ser496 N,O and with Pro494 O and Ser496 OG via a water molecule (Fig. 3A). Both ribose rings make hydrogen bonds with Asp-525 and make additional van der Waals contacts with Tyr-521. The O 2 Ј and O3Ј atoms of UDP-A make further water molecule-mediated hydrogen bonds with Asp-525. The phosphate groups on UDP-A make hydrogen bond interactions with Lys441 NZ , Gln469 OE1 (via a water molecule), and Asn521 N, ND2 . The phosphate groups on UDP-B interact with Lys441 NZ , Asn519 OD1 (via a water molecule), Thr520 N , Asn521 N , and Gly522 N . While the binding of UDP yielded only subtle changes in the overall ApHMW1C structure, binding appears to foster localized conformational changes. The side chain of Gln-469 sways into the UDP binding pocket, placing its amido group nearby the phosphate group of UDP. The side chain of Asn-521 swings away from UDP, allowing the main chain N and the side chain NH 2 to form critical hydrogen bonds with ␤-phosphate oxygen O1B and O3B and to thus resolve electrostatic clashes. Additionally, localized shifts are observed for the main chains of the loops between ␤10 and ␣19, ␤11 and ␣20, and ␤12 and ␣21, resulting in a fine-tuning effect of the binding pocket in the presence or absence of the UDP-sugar donor.
To test the mechanism of UDP-sugar binding, we generated a number of ApHMW1C variants with point mutations and examined enzymatic activity using a continuous spectroscopy assay (19). Lys-441, Asn-521, and Asp-525 form critical interactions with the ribose and phosphate moieties, and mutation of these residues eliminated enzymatic activity. Similarly, mutation of Tyr-498 resulted in a marked decrease in enzymatic activity (ϳ9% of wild type) ( Table 2). In contrast, mutation of Thr-438 had little effect on enzymatic activity, consistent with the fact that this residue does not make specific contact with UDP in the ApHMW1C structure (Fig. 3A).
The Interface between the AAD and GT-B Domains and the Acceptor Protein Binding Groove-The crystal structure of ApHMW1C revealed extensive contacts between the AAD and the GT-B domain, creating a unique groove adjacent to the UDP-sugar binding pocket (Figs. 1C, 2A, and 4A). The narrow end of the groove is ϳ7 Å, and the wide side of the groove measures ϳ18 Å. The hydrophobic core residues in the AAD and the GT-B domain are absolutely conserved between ApHMW1C and HMW1C and are highly conserved among representative HMW1C-like proteins. The surface of the groove is also remarkably conserved between ApHMW1C and HMW1C, including His214/His241, Asp215/Asp242, Met218/ Met245, His219/His246, Tyr222/Tyr249 (from ␣12 HDVYM-HCSY), His272/His298, Met349/Met375, and Asp350/Aps376 (supplemental Fig. S3 and Fig. 4A). The close association of the AAD and the GT-B domain renders the overall enzyme structure original and rather rigid, perhaps providing an explanation for the lack of significant conformational changes among different crystal systems (apo versus complex structures).
The structure-based sequence alignment revealed that the absolutely conserved residue His-277 corresponds to a putative catalytic base proposed in XcOGT homologs and is adjacent to the UDP binding pocket (Figs. 3A, 4, A and B, and 5A). As the first step to delineate the acceptor protein binding site and the catalytic mechanism, we examined a number of ApHMW1C mutants with point mutations affecting the region adjacent to the UDP-binding site and the groove (Tables 2 and 3). While most of the mutants showed decreased enzyme activity, mutation of Asp-215 resulted in null enzymatic activity. Interestingly, while the H277D mutant showed no apparent activity, the H277A mutant showed low specific activity (ϳ5% of wild type).
Structure-Function Analysis of HMW1C-Based on the ApHMW1C-UDP complex structure and the HMW1C model, we predicted that Lys-467, Asn-547, and Asp-551 are key residues for UDP binding in HMW1C (Fig. 3A). To test these predictions, we generated a series of HMW1C point mutations and then examined the effect of the mutant proteins on HMW1 glycosylation, HMW1 tethering to the bacterial surface, and HMW1-mediated adherence to human epithelial cells. As controls, we used wild type HMW1C and an HMW1C variant with a mutation of Thr-464 (Thr-438 in ApHMW1C), a residue that is predicted to be unrelated to UDP-sugar binding. As shown in Fig. 3B, mutation of Lys-467, Asn-547, or Asp-551 abrogated glycosylation as assessed by in vitro glycosylation assays using the purified HMW1C derivatives, a purified fragment of HMW1, and UDP-glucose and assessing glycosylation using DIG-Glycan reagents. As shown in Fig. 3C, examination of whole cell sonicates of bacteria expressing HMW1 and HMW1B with either wild type HMW1C or HMW1C-T464A revealed 160 kDa and 125 kDa bands corresponding to the glycosylated HMW1 pre-pro-protein and the glycosylated HMW1 mature protein, respectively. In contrast, in bacteria expressing HMW1 and HMW1B with HMW1C-K467A, HMW1C-N547A, or HMW1C-D551A, the HMW1 pre-pro-protein and the HMW1 mature protein were less abundant and migrated at lower apparent molecular masses, consistent with a lack of glycosylation. Analysis of adherence demonstrated that bacteria expressing HMW1C-K467A, HMW1C-N547A, or HMW1C-D551A were nonadherent in assays with Chang epithelial cells (Fig. 3D).
To assess whether residues along the groove at the interface of the AAD and the GT-1 domain are critical for HMW1C binding of the HMW1 acceptor protein or sugar moiety (Fig. 4A), we generated a point mutant involving His-303 (His-277 in ApHMW1C) and a triple mutant involving Asp-242, His-246, and Tyr-249 and then examined these derivatives in whole bacteria expressing HMW1 and HMW1B for an effect on HMW1 glycosylation as assessed by Western analysis and adherence assays. As shown in Fig. 4, C and D, mutation of His-303 by itself and of Asp-242, His-246, and Tyr-249 together resulted in a change in molecular mass of the HMW1 pre-pro-protein and the HMW1 mature protein and a loss of HMW1-mediated adherence, consistent with elimination of HMW1 glycosylation.

HMW1C-like Proteins Define a Novel Subfamily of the GT41
Family-In this study we report the first structure of an HMW1C-like protein, defining a new subfamily of the GT41 family characterized by versatile catalytic activities that include N-glycosylation of protein acceptor sites and O-glycosylation of sugar acceptor sites (18,19). The unique architecture of ApHMW1C consists of an N-terminal all ␣-domain (AAD) fold and a C-terminal GT-B fold. The AAD fold differs from the TPR fold that is characteristic of the GT41 family based on other members of the family. The GT-B fold contains the GT-1 and GT-2 domains and harbors the binding site for UDPhexose. The interface of the AAD and the GT-B domain creates a unique groove with potential to accommodate the acceptor protein. Based on kinetic analyses of active site mutants of ApHMW1C, we validated the critical role of key residues of the UDP-hexose binding pocket in glycosylation of HMW1. Using the structure-guided HMW1C model, site-directed mutagenesis, and glycosylation assays, we demonstrated that the H. influenzae HMW1C protein adopts the same structure as ApHMW1C, with critical residues for binding UDP-hexose including Lys-467, Asn-547, and Asp-551 in the GT-2 domain. In addition, we delineated the binding region for the HMW1 acceptor protein at the interface groove between AAD fold and the GT-1 domain.

Structural Comparison with XcOGT, an OGT-like GT41
Member-The GT41 family contains both bacterial and eukaryotic enzymes and was previously believed to include just O-GlcNAc transferases (OGT), with a characteristic TPR domain at the N terminus. Limited structural information is available for the GT41 family, namely the structure of a bacterial OGT homolog called XcOGT and the structure of human OGT (20 -23). The TPR domain of XcOGT contains three complete TPR repeats that form the standard TPR superhelix and are followed by two extra pairs of antiparallel ␣-helices called TPR-like repeats (TLRs). Although the ApHMW1C AAD and the XcOGT TPR regions (sharing ϳ11% sequence identity) have two different folds according to DALI (34) searches, we were able to manually superimpose the last four helices of the AAD with the corresponding helices of XcOGT (Fig. 2, B and C). Both the ApHMW1C AAD and the XcOGT TPR regions are closely associated with the GT-B domain. However, the resulting molecular surfaces of the two proteins are very distinct in shapes and chemical properties (Fig. 4, A and  B), suggesting an explanation for the different donor and acceptor specificities between HMW1C-like proteins and other OGT members of the GT41 family.
While the closest structural homolog of the ApHMW1C GT-B domain was the XcOGT GT-B domain (DALI Z-score, 30; rmsd, 3.1 Å; and 17% sequence identity over 332 residues of the GT-B domain), the structure-based sequence alignment of GT-B domains clearly visualizes two distinct subfamilies of GT41 members, namely HMW1C-like proteins and OGT-like proteins (Fig. 5, A and B). Consistent with the different mechanistic strategies utilized by these subfamilies, GT41 members show appreciable resemblance only in the UDP-binding pocket, the common structural moiety for substrates, with conservation of critical amino acids interacting with UDP (Fig. 6, A  and B). At the same time, there are important dissimilarities between the UDP-binding pockets in HMW1C-like proteins and OGT-like proteins. In particular, Asn-385 in XcOGT (mapping to Gln-839 in hOGT) makes a hydrogen bond with the ␣-phosphate group of UDP, whereas the corresponding residue in ApHMW1C (Thr-438 in ApHMW1C, mapping to Thr-464 in HMW1C) makes no direct contact with UDP. Our efforts to obtain the structure of the complete UDP-Glc substrate bound to ApHMW1C have been unsuccessful so far, probably reflecting the cleavage activity of ApHMW1C in the absence of the HMW1 protein acceptor, similar to reports of T4 phage ␤-glucosyltransferase (35,36). Nevertheless, on the basis of the structural comparison with other UDP-sugar complex structures, we predicted the plausible position of the glucose moiety for ApHMW1C (Fig. 6, A and B, and supplemental Fig.  S4). As expected, no clearly conserved residues of the sugar moiety sites were detected, reflecting the fact that the HMW1C-like proteins are specific for UDP-hexoses while the OGTs are specific for UDP-GlcNAc.
Comparison with Other Structural Homologs-Searches for structural homologs of the AAD of ApHMW1C (residues 1-257) using DALI revealed that the first 5 helices of the AAD have limited similarity to the C-terminal domain of glutathione S-transferase (GST), aligning with an rmsd of 2.7 Å and a Z-score of 5.3 for 79 C␣ atoms (Fig. 2B). Thus, the AAD in ApHMW1C appears to have a composite structure, with a partial GST-like motif at the N terminus and TLRs at the C terminus, making the AAD distinct from other ␣-helical bundle structures.
Searches for structural homologs of the GT-B fold in ApHMW1C (residues 258 to 620) revealed a number of other GT-B proteins, including members of the glycogen synthase 1 family (Z-score, ϳ19; rmsd, 4.2-4.8 Å; and 7-9% sequence identity on 308 -316 residues) and the GT4 family (Z-score, ϳ19; rmsd, 4.8 -5.2 Å; and 6 -8% sequence identity on 303-311 residues). Glycogen synthases are classified in two large GT families, namely the GT3 and GT5 families (31). Animal and yeast glycogen synthases belong to the GT3 family and use UDP-glucose as the glucose donor (37), while bacterial glycogen synthases are grouped in the GT5 family and use ADPglucose exclusively as the glucose source (38). Currently, glycogen synthase structures are available from Agrobacterium tumefaciens, E. coli, and Pyrococcus abyssi, all from the GT5 family (39 -41). Unlike HMW1C-like proteins and OGTs, these glycogen synthases contain the GT-B domain alone, without an extra N-terminal appendage (Fig. 5B). Although the nucleotide-sugar donor bound to available crystal structures of glycogen synthases is ADP rather than UDP, these enzymes transfer glucose, an important common feature with HMW1Clike proteins. These enzymes catalyze O-glycosidic bond formation between glucoses, another shared characteristic with HMW1C-like proteins. Comparison of ApHMW1C with structures of glycogen synthases containing glucose or the glucose polymer analog HEPPSO did not show clearly conserved residues in the glucose-binding site but suggested that ApHMW1C can accommodate a di-glucose in the reaction center ( Fig. 6C and supplemental Fig. S4), consistent with the observation that  ApHMW1C catalyzes O-glycosidic bond formation. In addition, the position of the sugar moiety in these structures is consistent with the position of the sugar analog in XcOGT (Fig. 6, B  and C).
The GT4 family is perhaps the largest of all the GT families (32) and contains sucrose synthase, ␣-glucosyltransferase, and diglucosyl diacylglycerol synthase, among others. Based on the wide range of donor and acceptor substrates, the GT4 family appears to have both functional and sequence diversity (42)(43)(44). When we discovered that HMW1C and ApHMW1C are glycosyltransferases with structural homology with the GT41 family, it was surprising to consider that enzymes in the same family would be capable of generating peptide N-linked, peptide O-linked, and sugar O-linked glycosides, especially since these activities would be anticipated to employ different mechanistic strategies. The current study clearly illustrates the unique characteristics of HMW1C-like proteins that combine features from several GT families to accommodate versatile activities.
Molecular Insights into the H. influenzae HMW1C Structure and Function-Given the high sequence identity between ApHMW1C and HMW1C, the ApHMW1C structure provided a basis for probing the molecular mechanism of HMW1C glycosyltransferase activity. Site-directed mutagenesis demonstrated that the UDP-hexose binding pocket in HMW1C is absolutely conserved with the pocket in ApHMW1C. The structure also revealed a funnel shaped groove adjacent to the UDP-hexose binding site with an orientation and configuration that suggested a mechanism for accommodating the acceptor protein. Indeed, mutation of Asp-242, His-246, and Tyr-249 in this groove abolished glycosylation of HMW1, consistent with the conclusion that this groove is critical for binding the acceptor protein. Based on previous studies of XcOGT and the crystal structure of ApHMW1C, we predicted that His-303 (His-277 in ApHMW1C, His-218 in XcOGT, and His-558 in human OGT) is the catalytic base. In fact, this His residue is invariant in the GT41 family (Fig. 5A). However, mutant ApHMW1C-H277A retained low but appreciable activity, raising a question about the identity of the catalytic base. Considering the position of this His residue in the active site and the observation that mutants ApHMW1C-H277D and HMW1C-H303R lack activity, we suspect that this residue is important for binding the sugar moiety or the acceptor protein, but not as the catalytic base. Based on the observation that mutant ApHMW1C-D215A resulted in null activity, we speculate that this absolutely conserved residue in HMW1C-like proteins (Asp-242 in HMW1C) plays a critical role in the recognition of the acceptor substrate. The recent structure of human OGT indicated that the His-498 of human OGT is the probable catalytic base, not the previously proposed His-558 (23). This His-498 residue is not conserved in XcOGT, but corresponds to Phe-165 in XcOGT that is located in helix ␣9 of the structure (Fig. 2C). A structure overlay aligned Tyr-222 of helix ␣12 in ApHMW1C (Tyr-249 in HMW1C) with Phe-165 of XcOGT (Fig. 2, B and C, and 4B).
In conclusion, this study demonstrates the structural basis for the glycosyltransferase activity in HMW1C and ApHMW1C, members of a novel subfamily of the GT41 family of glycosyltransferases. The HMW1C-like proteins share features of glycogen synthases and OGTs, in part accounting for their dual function as glycosyltransferases that catalyze N-linkage to HMW1 and O-glycosidic bonds between glucose residues on HMW1.