Specificity of Processing α-Glucosidase I Is Guided by the Substrate Conformation

Background: The enzyme “GluI” is key to the synthesis of critical glycoproteins in the cell. Results: We have determined the structure of GluI, and modeled binding with its unique sugar substrate. Conclusion: The specificity of this interaction derives from a unique conformation of the substrate. Significance: Understanding the mechanism of the enzyme is of basic importance and relevant to potential development of antiviral inhibitors. Processing α-glucosidase I (GluI) is a key member of the eukaryotic N-glycosylation processing pathway, selectively catalyzing the first glycoprotein trimming step in the endoplasmic reticulum. Inhibition of GluI activity impacts the infectivity of enveloped viruses; however, despite interest in this protein from a structural, enzymatic, and therapeutic standpoint, little is known about its structure and enzymatic mechanism in catalysis of the unique glycan substrate Glc3Man9GlcNAc2. The first structural model of eukaryotic GluI is here presented at 2-Å resolution. Two catalytic residues are proposed, mutations of which result in catalytically inactive, properly folded protein. Using Autodocking methods with the known substrate and inhibitors as ligands, including a novel inhibitor characterized in this work, the active site of GluI was mapped. From these results, a model of substrate binding has been formulated, which is most likely conserved in mammalian GluI.

N-Glycosylation, the addition of a glycan to an asparagine residue, is the most common post-translational modification in eukaryotes, with over half of all eukaryotic proteins estimated to be glycosylated (1). The presence and identity of an N-glycan on a protein affect stability, folding, and intermolecular interactions. More broadly, N-glycans play critical roles in reaction kinetics modulation, intracellular protein trafficking, and cellcell adhesion and communication. Several enveloped viruses require N-glycosylation of their coat proteins for successful infectivity (2); as such, the N-glycosylation pathway is of key antiviral therapeutic interest. Thus, structural and mechanistic investigations into enzymes that mediate N-glycosylation are of fundamental importance to studies of human health and physiology.
Assembly and processing of the protein-glycan conjugate takes place in the endoplasmic reticulum and Golgi in an intricate, branching, multistep system (3). At the initial stage of this pathway, the enzyme "processing ␣-glucosidase I" (GluI) 3 catalyzes the selective removal of the terminal glucose from the newly linked glycoprotein, in a co-translational process. GluI holds a key regulatory position in the N-glycosylation pathway by maintaining forward momentum of the glycan transfer reaction and by working in conjunction with the folding quality control system (4 -6). Loss or inhibition of GluI prohibits further glycoprotein processing in the endoplasmic reticulum, and also has an impact on the resident lipid-linked and free oligosaccharide species' populations (7)(8)(9). Of therapeutic relevance, inhibition of GluI activity results in reduced assembly and infectivity of several enveloped viruses including hepatitis B and C, influenza, HIV, and others (10 -14). However, the known inhibitors are not specific, resulting in undesirable side effects. A specific inhibitor for GluI could impact viral infectivity, whereas avoiding off-target interactions. The knowledge of the structure and catalytic mechanism of GluI would greatly aid in design or discovery of such an inhibitor.
GluI is a single-pass type II transmembrane protein of ϳ80 -110 kDa, with the bulk of the protein including the catalytic region found in the endoplasmic reticulum lumen (15)(16)(17)(18). The biological substrate for GluI is Glc 3 Man 9 GlcNAc 2 , whether dolichol-linked, protein-linked, or as a free oligosaccharide; GluI cleaves the terminal glucose-␣(132)glucose glycoside linkage, releasing glucose. In all eukaryotic homologs tested, GluI is specific for this linkage, and the minimum cleavable substrate is glucotriose with ␣(132) and ␣(133) linkages as found in the native substrate (15, 19 -22). The glucose-␣(132)glucose disaccharide, kojibiose, inhibits GluI activity weakly (15,23). Interestingly, the only documented biological occurrence of this glucotriose is found in the eukaryotic N-glycosylation pathway; thus, the relationship between this enzyme and this substrate is unique in biology.
As a glycoside hydrolase (GH), GluI is a member of the CAZy database GH family 63 and clan GH-G whose members operate via an inverting mechanism; the catalytic acid and base are not definitively known (24,25). A substrate binding motif in the rat and mammalian homologs has been proposed (26), but no eukaryotic structures have been determined. Two structures have been solved of prokaryotic GH63: the Escherichia coli homolog YgjK (27) (PDB code 3D3I), and the T. thermophilus homolog TTHA0978 (PDB ID 2Z07, RIKEN structural genomics). Both structures contain an (␣/␣) 6 toroid fold, whereas YgjK possesses an additional N-terminal super-␤-sandwich domain. Neither of these structures is sufficiently similar to mammalian GluI to act as a realistic model at the atomic level.
Much of what we know about the characteristics of GluI has been learned from studying the Saccharomyces cerevisiae enzyme, Cwh41p, which we have stably purified from Pichia pastoris overexpression as a transmembrane-deletion construct, Cwht1p (28). Cwh41p and human GluI share 24% overall identity and from 34 to 59% identity in the catalytically active C-terminal domain (17), and so similar structures are expected. Yeast and human GluI share similar substrate specificity, pH optimum, and inhibitor sensitivity (19,29). In both enzymes, arginine, tryptophan, or cysteine modification results in an inactive enzyme (26,30,31). Thus, the yeast enzyme serves as a good experimental model to learn more about the structure, substrate specificity, and enzymatic mechanism of human GluI.
In this work, we have determined the structure of Cwht1p to 2 Å. Based on structural similarity, the active site residues are proposed to be a glutamate (Glu 771 ) and an aspartate (Asp 568 ) in the center of the (␣/␣) 6 barrel that forms the catalytic C-terminal region. The crystal packing prohibits experimental active-site investigations due to occlusion of the active site by a His 6 purification tag from a crystal contact. Therefore, the active site was investigated by in silico methods using small ligands. These analyses indicate a basis for the substrate specificity of GluI and features important for inhibitor development.

EXPERIMENTAL PROCEDURES
Data Collection, Structure Phasing, and Refinement-The growth of diffracting Cwht1p crystals was described in our previous work (28). A heavy-atom approach to phasing was taken with a panel of heavy atoms first screened using a gel-shift assay to prioritize the compounds (32). The successful phasing signal was obtained from ethyl mercuric phosphate, which was soaked into the crystal at 1 mM for 16 h prior to freezing.
Cwht1p crystals were looped into 1:4 paratone:mineral oil (Hampton) as cryoprotectant and flash-frozen in a nitrogen stream at 100 K. Diffraction data for the native crystals were collected on an ADSC Quantum-4 CCD detector at the Cornell High Energy Synchrotron Source (CHESS), F-1 beamline. Data from the mercury-derivative crystals were collected on the ADSC Quantum-315 CCD detector at the Advanced Photon Source (APS), BioCARS, 14-BM-C beamline, at the Argonne National Laboratories.
All diffraction data were processed using HKL-2000 (33). Single anomalous diffraction phasing was performed with AutoSol within the PHENIX (Python-based Hierarchical Environment for Integrated Xtallography) program package (34).
GluI crystallized in space group P2 1 2 1 2 1 , with one molecule in the asymmetric unit. The Matthews coefficient is 2.6 Å 3 /Da, corresponding to 49.5% solvent. Single anomalous diffraction phasing with the mercury-derivatized dataset gave eight heavy atom sites with a phasing figure of merit 0.493 and Bayes correlation coefficient of 53.3. Following phasing and density modification, an initial model was auto-built using the automated program Autobuild in PHENIX (34), containing 588 residues, with R/R free of 0.341/0.366. This initial model was manually built in Coot and refined with Refmac in CCP4 against the native data set, giving a final model containing 788 (of 813) residues, with an R/R free of 0.204/0.178 (35)(36)(37). The Molprobity web server was used to evaluate the structure quality during and after refinement (38). As calculated by MolProbity, there were no Ramachandran outliers, no C␤ deviations, and an acceptable value (Ͻ1%) of poor rotamers (0.57%) in this structure. The coordinates of the final refined model have been deposited in the Protein Data Bank (39) with code 4J5T.
Cwht1p Structure Analysis, Comparison, and Prediction Methods-A variety of software packages and web servers were used to evaluate and analyze the Cwht1p structure. Crystal packing interfaces were evaluated using the PDBePISA (Protein Interfaces, Surfaces, and Assemblies) web server (40). Cwht1p was queried using the DALI Lite version 3 server to determine similar structures in the Protein Data Bank for comparative analysis (41). When querying based on domains, the Cwht1p model was split into the N-domain (residues including linker region; residues 1-386) and the C-domain (excluding the nonnative C-terminal tag; residues 287-800). Pairwise structural comparisons and r.m.s. deviation calculations were performed using the Dali Pairwise server. Structure-based sequence alignments were performed using the PROMALS3D server (42). Electrostatic surface calculations were performed with CHARMM using the PBEQ server (43). Structural figures were generated using the PyMol program. Two-dimensional interaction diagrams were adapted from the PoseView server output (44).
Double mutants (DAEA, DNEQ) were constructed sequentially, using the single mutants; all mutations were confirmed by sequencing. The plasmid PCR products using these primers were digested by DpnI to remove non-mutated clones, and transformed into P. pastoris X33 using the PEG transformation method as described for native Cwht1p (28). The six single and double mutant proteins were purified and their activity tested using the same protocol as the native Cwht1p (28).
Note that the residue numbering in this work is consistent with our previous Cwht1p publication (28), and begins after the 33 additional residues present in the full-length Cwh41p. Thus, residue 1 is methionine, and the Cwht1p construct expressed here from pPICZ␣A⌬XhoI contains 4 residues prior from the expression vector (Glu Ϫ3 to Phe 0). Both the N and C termini are non-native in this protein, being a N-terminal transmembrane-deletion construct with a C-terminal His 6 tag.
For low-resolution structural characterization of the mutants, circular dichroism (CD) was performed. The CD spectra of native and mutant proteins were measured on an Aviv model 62 DSA spectrometer. Samples were prepared with 3.5 M protein in 20 mM sodium phosphate, pH 6.8, 100 mM NaCl. Spectra were collected from 320 to 205 nm with a 1.0-nm bandwidth and 1-mm path length.
Inhibitor Screening-Michaelis-Menten kinetic parameters have been determined for Cwht1p with synthetic trisaccharide and tetrasaccharide substrates, subregions of the Glc 3 Man 9 GlcNAc 2 biological substrate (28,45). The same assay was used here to screen a panel of compounds for their inhibition of Cwht1p. Briefly, Cwht1p is incubated with the tetrasaccharide substrate for 10 min, and the reaction subsequently quenched. Product concentration is then determined using the colorimetric Glucose Oxidase assay (Sigma). Dimethyl sulfoxide-dissolved compounds (supplemental Fig.  S1) were screened for inhibition by preincubating the compound with the enzyme for 10 min prior to substrate addition. Enzymatic activity was calculated as a relative to the control. The compounds screened for inhibition were initially assayed at 1 mM final concentration; compounds showing substantial inhibition were subsequently tested at a range of concentrations.
Intrinsic Fluorescence of Cwht1p with Glucose-Glucose-Cwht1p binding was investigated using tryptophan fluorescence experiments. Fluorescence spectra were measured with a Shimadzu Scientific Instruments RF-5301PC Spectrofluorophotometer. Samples were excited at 295 nm and the fluorescence emission observed from 290 to 420 nm. The slit width was set to provide a band pass of 10 nm for excitation and 3 nm for emission. The cuvette was thermostatted at 20°C. Samples were prepared with 15 M Cwht1p, 20 mM sodium phosphate, pH 6.8, 100 mM NaCl, and a range of glucose concentrations (35 mM to 3.5 M).
Docking Sugars and Inhibitors into the Active Site-Ligands were docked in silico into the proposed active site of Cwht1p. The ligands (glucose, miglitol, deoxynojirimycin, kojibiose, and glucotriose) used in these docking runs include sugars that form part of the biological substrate, as well as known inhibitors. All docking runs were performed using AutoDock Vina (46). Energy-minimized oligosaccharide PDB files were generated using the GLYCAM Carbohydrate Builder. Cwht1p and ligand pdbqt files were prepared for docking input in the Auto-Dock Tools GUI, which calculates surface grids with partial atomic charges. All non-ring bonds in the ligands were set as rotatable. AutoDock Vina was used to run the docking in a 36 ϫ 36 ϫ 36 Å 3 box around the active site. The exhaustiveness was set to 150, and the top output poses were ranked by their calculated binding affinities.

RESULTS
Cwht1p Structure-The structure of Cwht1p was solved to 2.04 Å (structure and topology in Fig. 1 and supplemental Fig. S2; crystallographic statistics in supplemental Table S1 and "Experimental Procedures," PDB code 4J5T). Cwht1p is a globular protein, with dimensions roughly 95 ϫ 45 ϫ 55 Å, consisting of two domains. The N-domain (residues Ϫ2 to 278) consists of an N-terminal ␣-helix (NH1), a 13-strand super-␤sandwich (NS1-NS13), and two ␣-helices (NH1,2) between NS11 and NS12. The N-and C-domains are connected by 42 residues, which includes one linker ␣-helix, LH1. The C-domain (residues 320 to 808) consists of 12 helices (CH1-CH12) in an (␣/␣) 6 toroid bundle, with an extra structural unit, the CЈ-region, containing two ␣-helices (CЈH1,2) and eight ␤ strands (CЈS1-8). Density was missing for one amino acid at the N terminus, and for two loops (residues 226 -231 between NH 2 and NH3, and residues 474 -492 between CH4 and CH6). Two N-glycans, consisting of two GlcNAc residues each, were visible in the electron density, linked to asparagines 9 and 89 in the N-domain of the structure. One disulfide bond is present between cysteines 636 and 652. The final refined model contains 284 water molecules.
Cwht1p expressed here in P. pastoris is an N-glycoprotein, as previously determined through PNGase treatment and a band shift on SDS-PAGE (28). Based on the 2-4-kDa band shift upon deglycosylation, and assuming identical composition of both glycans (N9 and N89 in the structure), roughly 3-8 more residues would be likely present on each glycan, but are disordered in the crystal. Neither glycan site is located near a crystal packing interface, and both glycans point away from the protein, with no major protein-sugar contacts seen in the modeled residues. There was no unmodelled density at nearby symmetryrelated proteins in the crystal to suggest that glycans were participating in the crystal packing at the distal end.
Crystal Packing and Interfaces-There are two major crystal packing interfaces ( Fig. 2A), with a calculated free energy of solvation of Ϫ14.5 and Ϫ5.1 kcal/mol, for the large (1456 Å 2 ) and small (747 Å 2 ) interfaces, respectively, with negative values indicating thermodynamic favorability. Contained within the large interface is the His 6 tag of a symmetry-related mate in the crystal (Fig. 2, B and C), which itself contributes 491 Å 2 of the buried surface area, and Ϫ5.4 kcal/mol of the solvation free energy of that interface.
Similar Structures-A DALI search of the PDB gave several hits with high Z-scores and good structural similarities to Cwht1p (supplemental Table S2). Within the list of hits, the shorter-length proteins aligned only to the C-domain. The longer proteins possessed regions similar to both the N-and C-domains of Cwht1p, with variability in the interdomain orientation. Notably, the top two DALI hits were YgjK and TTHA00978, the two bacterial GH Family 63 structures. A structure-based multiple sequence alignment of Cwht1p, YgjK, and TTHA00978 was performed in conjunction with human and rat primary sequences, with the C-terminal alignment presented in supplemental Fig. S3.
GH Family 63 Structures and Alignment-The two domains in Cwht1p are found in numerous other solved protein structures (supplemental Table S2). Cwht1p shares the highest similar overall architecture with the two-domain protein YgjK (E. coli GH 63 member); in an overall superposition, with C-domains of YgjK and Cwht1p matched, the N-domains are rotated ϳ15 o relative to each other along the major axis of the protein (supplemental Fig. S4A). When comparing individual domains, the N-and C-domains share backbone r.m.s. deviations of 2.6 and 2.5 Å, respectively. TTHA0978 (the Thermus thermophilus GH 63 member) has a single domain, structurally similar to the C-domain of Cwht1p (backbone r.m.s. deviations 2.7 Å). No other GH 63 structures have been solved to date. Trehalase (Tre37), the third hit, is a known glycoside hydrolase in family 37, and shares the GH-G fold architecture with GH 63 proteins. Subsequent neighbors are glycoside hydrolase members of the GH-L clan, containing similar (␣/␣) 6 barrel and super-␤-sandwich domains. Both TTHA00978 and Tre37 contain only the (␣/␣) 6 toroid domain, whereas the remainder of the hits listed additionally contain the super-␤-sandwich domain. Alignment at the known active site between the top four similar structures and the Cwht1p structure is indicated in Table 1 and supplemental Fig. S4C.
Mutant Protein Purification and Characterization-Based upon proposed active site residues suggested by the structure, six point mutants of Cwht1p were expressed in P. pastoris and purified using nickel-nitrilotriacetic acid IMAC and gel filtration chromatography (supplemental Fig. S5, A-C). All mutants show no catalytic activity against the tetrasaccharide substrate (supplemental Fig. S5D). Circular dichroism data of the mutants and native protein (supplemental Fig. S5E) indicate no gross structural changes between the mutant and native proteins.
Glucose and Inhibitor Binding Experiments-Prior to in silico docking experiments, we examined Cwht1p binding with glucose (the reaction product) and candidate inhibitors. Tryptophan fluorescence of Cwht1p in the absence or presence of glucose was performed, with the spectra presented in Fig. 3A. No shift in the peak wavelength was seen upon the addition of glucose; max was observed at 330 nm in all spectra. Increasing the glucose concentration results in increased intensity of tryptophan fluorescence, suggesting stable binding of glucose at tryptophan-rich sites. Of the 18 tryptophan residues in Cwht1p, eight are exposed to the solvent, and four are found in In the center view, the N and C termini are at the back of the structure, pointing into the page. Rotated views at left are not to scale with the center view. The two loops that could not be modeled due to lack of density are indicated with a single asterisk (residues 226 -231) and double asterisks (residues 474 -492). Top right shows a stick model of GlcNAc 2 linked to each of asparagine residues 89 and 9, modeled into electron density map. At bottom right, a larger view of the C-domain center, with secondary structure elements labeled as discussed under "Results" (C or CЈ for C-domain or CЈ-region, respectively; H or S for the helix or strand secondary structure, respectively, numbered within those regions); side chains of residues Asp 568 and Glu 771 are shown in orange sticks. (Note: Fig. 4 is in the same reference orientation.) the proposed active site pocket (Trp 381 , Trp 710 , Trp 715 , and Trp 789 ). The presence of 18 tryptophans prevents conclusions about which particular residues may be interacting with the glucose molecules: multiple binding locations are possible, and glucose interaction could cause conformational changes potentially altering the local chemical environment around tryptophan residue(s). However, given the fact that half of the solvent-exposed tryptophans are found in the active site, and that this enzyme is active upon glucose-oligosaccharides giving glucose as a product, it is reasonable to expect that there is likely glucose binding to the active site, supported by the tryptophan fluorescence results. Thus, these data encourage further in silico docking to the Cwht1p active site for investigation of substrate/ligand binding modes.
Six known glycoside hydrolase inhibitors (supplemental Fig.  S1) were screened for inhibition of Cwht1p activity, with results shown in Fig. 3B. Of the six, only miglitol was an effective inhibitor at 1 mM; a dose-response curve indicated that miglitol inhibits with an IC 50 of 22 M (Fig. 3C). Due to constraints on the tetrasaccharide substrate availability, a full K i could not be determined; however, this IC 50 is in the same range as the K i value of the parent compound deoxynojirimycin (DNJM) of miglitol (50 M). As a glucose analog, it is likely binding in the active site of the enzyme, as has been seen in other inhibitorbound glycosidase structures (47)(48)(49)(50).
In an attempt to displace the histidine tag from the active site, the structures of Cwht1p soaked with inhibitor (DNJM and   Table S3). Despite high concentrations of inhibitor, the histidine tag was not displaced from the active site of the symmetry-related mate (supplemental Fig. S6). Extensive efforts to crystallize the protein following enzymatic cleavage of the histidine tag were unsuccessful.
Glucose and Inhibitor Docking in Silico-The compounds ␣-D-glucose, miglitol, DNJM, and kojibiose (␣-D-glucose-(132)-␣-D-glucose) were independently docked to a single molecule of Cwht1p (that is, with no His tag in the active site) in a box around the active site, and the top binding results (poses) were modeled. The top poses for each are overlaid in the Cwht1p structure in Fig. 4A. The single-ring ligands all docked into two locations: the pocket containing the proposed catalytic residues ("site A"), and a second pocket roughly 12 Å away formed by residues 419 -455 in the CЈ region ("site B"). Site A is the proposed active site pocket, and the single-ring ligands docked here (including the top pose) make polar contacts with Cwht1p residues Trp 391 , Asp 392 , Arg 428 , Gly 566 , Asp 568 , Trp 710 , and Glu 771 . The single-ring ligands binding to site B make polar contacts with Cwht1p residues Glu 361 , Glu 443 , Arg 428 , Glu 429 , Phe 444 , Val 446 , Gln 447 , and Asn 448 . No stacking interactions were seen between these ligands and Cwht1p aromatic residues. The calculated binding affinities for glucose, DNJM, and miglitol ranged from Ϫ5.2 to Ϫ6.0 kcal/mol, Ϫ4.7 to Ϫ5.2 kcal/ mol, and Ϫ5.0 to Ϫ5.5 kcal/mol, respectively.
The top binding affinities for glucose, DNJM, and miglitol from the docking results were Ϫ5.9, Ϫ5.5, and Ϫ5.2 kcal/mol, which equate to calculated K i values at 37°C of 69, 132, and 215 M, respectively. Glucose as an inhibitor has not been directly tested in the activity assay used here, as it is a substrate for the secondary/reporter glucose oxidase reaction. An indirect evaluation of glucose inhibition can come from its presence as a product of the Cwht1p catalytic reaction. In the activity assay, the reaction rate first decreases slightly at the longest time point tested (120 min), which contains 120 M glucose product (38). This activity decrease is likely due to product inhibition of Cwht1p, and is seen to be in the same order of magnitude as the calculated glucose K i value of 69 M.
The disaccharide kojibiose contains the structure of the two terminal ␣(132)-linked glucoses in the natural substrate, Glc 3 Man 9 GlcNAc 2 . As a larger molecule, kojibiose had a more complex set of results, with the top nine poses overlaid in the bottom panel of  between poses), all oriented with the non-reducing glucose in the pocket, and the reducing end pointing outward in a direction away from site A. The outlying pose found in site B has an opposite orientation: the reducing glucose is in the site B pocket, with the non-reducing glucose pointing outwards. No stacking interactions between kojibiose residues and Cwht1p aromatic residues were seen in any binding modes.
Glucotriose docking was performed to investigate potential catalytic binding sites of the minimum substrate of Cwht1p, and the subsites occupied by its sugar residues. The top nine docked poses were investigated, showing binding affinity ranging from Ϫ6.5 to Ϫ7.6 kcal/mol. These poses were manually sorted into sets of binding modes from the roughly common orientations between poses (Fig. 4B), giving four major binding modes: A1, A2, A3, and B, based upon the site (A or B) occupied by the trisaccharide. The Cwht1p contacts with each sugar residue are listed in Fig. 4B.
Of the four glucotriose binding modes, mode B is the most distinct. As seen in Fig. 4B, mode B contains Glc1 in site B, inaccessible to the proposed catalytic residues; there are no other suitable acid/base amino acids in this location to act as catalytic residues. As a result, this mode does not likely reflect the binding site for the glucotriose substrate, and it will be excluded from further analysis in proposing the catalytic site.
The remaining binding poses were docked into site A. In modes A1 and A2, Glc1 is found in the active site pocket. Conversely, in mode A3, the active site pocket contains Glc2. Comparing the A1 and A2 modes, the Glc2 and Glc3 residues are rotated 45 o from the Glc1-Glc2 axis. In all three of these binding modes, carbon 1 of Glc1 is situated appropriately for glycoside hydrolysis by Asp 568 and Glu 771 , the proposed catalytic residues.
Determination of the Substrate-binding Model-In evaluation of the binding modes to determine the likeliest substratebinding model, we investigated the glucotriose conformation. GluI enzymes from yeast and mammalian sources do not cleave simple sugar substrates such as p-nitrophenyl-␣-glucose; they require a minimum trisaccharide of three non-reducing terminal sugars from the 14-mer oligosaccharide substrate (22,28,38). This linkage is unique in biology; no other reports of such a glucotriose are found in the literature. As a result of Glc2 having two glycoside bonds at neighboring carbons (1 and 2), this glucotriose has a unique shape: it is non-linear, and bent back on itself, with an expected intra-chain interaction between Glc1 and Glc3. GluI is highly specific for this sugar, and does not cleave other linear glucose chains, indicating that this bentback shape may be important for interaction with the active site, and may be a selectivity determinant for binding and/or catalysis.
The sugar conformation in binding modes A1 and A2 have similar shapes: they have an 80°angle along the chain, in comparison to the much more acute A3 mode (Fig. 4C). Furthermore, there is an intra-saccharide stacking interaction in A3, between Glc1 and Glc3, which is not seen in the other modes. This stacking interaction lines up well with Tyr 709 in the active site. This evidence, with the highly specific and unique relationship between Cwht1p and its substrate, places binding mode A3 as the likeliest binding mode.
Residue Conservation Supports Mode A3 as the Substratebinding Model-All eukaryotic GluI homologs share similar substrate specificity to the glucotriose discussed here. This substrate has not been tested with prokaryotic GluI homologs, which are active against other oligosaccharides; however, prokaryotic glycans have not been shown to contain the glucotriose oligosaccharide with the ␣(132) and ␣(133) linkages (51). Therefore, residue conservation in GH 63 enzymes can be considered in light of the binding model proposed here. Using the structure-based sequence alignment shown in supplemental Fig. S3, supplemental Table S3 presents the conservation of the key residues involved in mode A3, the likeliest binding model.

DISCUSSION
We have determined the first eukaryotic structure of GluI, a large glycosylated GH 63 enzyme responsible for the first step in the N-glycosylation trimming pathway. The catalytic residues are identified from the structure to be Asp 568 and Glu 771 , mutations of which abolish activity. Using docking methods with inhibitors and the substrate as ligands, we have mapped the active site cleft, and proposed a substrate binding model.

Cwht1p Structural Features and Key
Residues-Cwht1p consists of two domains joined by a linker helix; the N-domain is a super-␤-sandwich, and the C-domain is an (␣/␣) 6 toroid with additional structural units, termed the CЈ-region, on one face. The two closest structural neighbors (YgjK and TTHA00978) are GH Family 63 members with Cwh41p, validating the CAZy sequence-based classification for these proteins. Like Cwht1p, all of the characterized closest structural neighbors are inverting glycosidases (supplemental Table S2); the active sites of four of these nearest neighbors have been characterized. These enzymes vary in substrate specificity, and are similar to either both or one (the (␣/␣) 6 toroid) domain of Cwht1p. Catalytic activity is found in the center of the (␣/␣) 6 barrel. The substrates can access the active site on the same face as the CЈ region; thus this region, which varies greatly between structural neighbors, likely provides substrate selectivity.
The role(s) of the super-␤-sandwich domains in Cwht1p and its structural neighbors is unclear; this fold resembles a family of carbohydrate-binding molecules (52). It may be involved in protein-protein interaction with neighboring enzymes (oligosaccharyl transferase or ␣-glucosidase II) along the N-glycosylation trimming pathway, or in interaction with substrate N-glycoproteins. The glycans present on the N-domain of Cwht1p may or may not be found in the native enzyme; host cell and expression conditions often result in altered glycosylation patterns (53)(54)(55)(56)(57). The glycosylated status of Cwht1p does not affect catalytic activity in vitro (58). Thus, no specific conclusion can be made about the role or presence of these glycans in GluI across species.
Cwht1p is the soluble C-terminal construct of Cwh41p, a type II membrane protein. In the model of Cwht1p, the N terminus protrudes from the convex face of the protein. In Cwh41p, this would be the location of the 33-residue transmembrane pass across the endoplasmic reticulum membrane. There is no clear electrostatic or hydrophobic patch on this face of the protein to indicate interaction with the membrane. In previous overexpression studies of the full-length Cwh41p in S. cerevisiae, both the membrane-bound and a soluble truncated form were isolated, despite the presence of a range of protease inhibitors during purification (23). This evidence of proteolytic cleavage of the protein in its native host could indicate that a portion of Cwh41p is present and active in its soluble form in the endoplasmic reticulum, without being tethered to the membrane. Similar proteolytic release of transmembrane proteins has been seen for glycosyltransferases in the Golgi (59 -60).
Previous studies have shown the in vivo and in vitro effects of mutations within ␣-glucosidase I. From the structure-based sequence alignment, residues associated with these mutations were aligned with the Cwht1p structure. The results are summarized in Table 2. In general, any mutations interfering with the active site have been shown to reduce activity, as expected. No N-domain mutations, nor any benign mutations, have been reported in these studies. Aside from Cwht1p, two smaller constructs of Cwh41p (Cwht2p and Cwht3p) were cloned and their expression attempted in S. cerevisiae by others (58) and in P. pastoris by ourselves (28). Analysis of the structure of Cwht1p sheds light on their lack of expression in the two different hosts ( Table 2).
Catalytic Residues of GluI-Prior biochemical studies have proposed two pairs of possible catalytic residues in Cwh41p: Asp 584 and Glu 771 of Cwht1p (from primary sequence alignment with prokaryotic GluI, Ygjk (27)); and Asp 584 and Glu 580 (from primary sequence alignment with proposed mammalian GluI substrate binding motif (26)). Alanine mutations of Asp 584 and Glu 580 have shown a loss of Cwht1p activity (58). Within the structure solved here, these two residues are found at the N terminus of helix CH6. They face the interior of the structure, and make polar contacts with several residues in the loop following CЈH2. Given the solvent-inaccessibility of these amino acids, they are not likely to be catalytic residues. However, mutation of these residues to alanine, a small nonpolar resi-due, would disrupt the contacts with the CЈ-region, possibly leading to misfolding. This is consistent with the decreased expression levels and abrogated catalytic activity of the D584A and E580A constructs. No studies have been published documenting the mutation of Glu 771 or its corresponding residue in other homologs. In the Cwht1p structure, Glu 771 is solvent-accessible.
Alignment of the C-domain of Cwht1p with the top-ranked structural homologs reveals tight structural conservation with GH-G and GH-L fold clans, particularly of the (␣/␣) 6 barrel, despite the relatively low sequence identity (supplemental Table  S2). The CЈ-domain is much more variable between structural homologs, consistent with their variable substrate specificities. Within the GH-G and GH-L clans, the active site is found at the center of the (␣/␣) 6 bundle. Hydrolysis proceeds via an acid-base mechanism, utilizing a pair of carboxylic acidic residues to catalyze the reaction. The catalytic residues (glutamate and aspartate) of the characterized structural neighbors align with Glu 771 and Asp 568 of Cwht1p, at the core of the bundle (Table 1 and supplemental Fig. S4C). Thus, these two amino acids are proposed to be the catalytic residues for glycoside hydrolysis of Cwh41p. Cwht1p single and double point mutants at Asp 568 and Glu 771 were expressed and purified (supplemental Fig. S5). They share similar expression and purification properties to the native Cwht1p, and circular dichroism data indicates that the mutations have not induced large structural deviations from the native state. However, they are unable to cleave the tetrasaccharide substrate. Thus we have obtained properly folded, non-catalytic mutants of Cwht1p for use in active site investigations.
Interestingly, in a structural overlay, the proposed catalytic residues Asp 568 and Glu 771 in Cwht1p align, respectively, with Asp 501 and Glu 727 of YgjK (Table 1 and supplemental Fig. S4C). This contrasts the primary sequence alignments (27), which supported Cwht1p Asp 584 and Glu 771 as catalytic residues. Thus, the initial primary sequence alignments were incorrectly aligned in the region of Cwht1p Asp 568 . Three-dimensional information allows an improved structure-based sequence alignment of the yeast, bacterial, and mammalian GH 63 members (supplemental Fig. S3). Using this alignment, the previously proposed (26) mammalian (human) binding sequence 594 ERHLDLRCW 602 aligns with Cwht1p 580 ELNVDALAW 588 ; this is found in the CЈ-region of the Cwht1p structure. This loop does not align well with the prokaryotic structures (YgjK and TTHA00978) and varies largely between similar structures. The mammalian binding sequence was proposed based upon chemical modification studies supporting the presence of Arg, Trp, Tyr, and Cys in the active site. However, there are several Arg, Trp, Tyr, and Cys residues in the threedimensional Cwht1p structure that are conserved distant in primary sequence. In particular, Arg 387 , Trp 391 , Tyr 709 , Trp 710 , Arg 711 , and Trp 789 line the proposed binding pocket, and are conserved between the yeast and mammalian sequences in the structure-based alignment. Thus, the chemical modification experiments proposed the mammalian binding motif can be re-interpreted in light of the solved Cwht1p structure, to support the catalytic site surrounding Asp 568 and Glu 771 .
Crystal Packing and Utility of this Model for More Active Site Investigations-The crystal form used here (28) was the only reproducible form found from screening and optimizing sparse matrix screens. Packing analysis of this form shows that Cwht1p crystallized with two main interfaces and a total buried surface area of 2203 Å 2 . Each of the interfaces is much larger and more thermodynamically favorable than average in crystallization (61,62). The thermodynamic favorability of this crystal form could explain why it was seen in multiple distinct conditions, and why no other reproducible forms were seen with this construct across many conditions. The large interface contains many interactions between the C-terminal His 6 tag of one monomer and the interior of the (␣/␣) 6 bundle of a crystallographic symmetry-related monomer (Fig. 2, B and C). These interactions contribute approximately one-fourth of the buried surface area and solvation free energy of that interface, a significant contribution from this non-native tag. Notably, His 808 is hydrogen-bonded to Glu 771 , a proposed catalytic residue. His 806 -808 are also interacting with several highly conserved residues (Asp 392 , Phe 389 , Phe 385 , and Phe 444 ) and one moderately conserved residue (Arg 428 ).
Following inhibitor (DNJM and miglitol) soaks at a high inhibitor concentration (10 -100-fold K i or IC 50 ), the histidine tag, rather than the inhibitor, was still clearly seen in the electron density of the active site (supplemental Fig. S6). Soaking

TABLE 2 Proposed effects of known ␣-glucosidase I mutations or truncations
Non-yeast mutations have been aligned to Cwht1p using the structure-based sequence alignment, shown in Figure S4. Structural terminology is consistent with Fig. 1  higher inhibitor concentrations resulted in crystal damage; it is likely that displacement of the histidine tag disrupts this major interface. As a result, this crystal form is not optimal for crystal soaks and active site mapping, and so we proceeded with an in silico approach to address these investigations.
Mapping Active Site with Inhibitors and Glucose-Intrinsic tryptophan fluorescence experiments here provide preliminary qualitative support for glucose binding (Fig. 3A). Additionally, three GluI inhibitors have been characterized to date: miglitol and DNJM, both single-ring glucose analogues, and the disaccharide kojibiose, ␣(132)-linked glucobiose. In this work, docking studies with these ligands were used to map the catalytic site.
All the single-ring ligands docked to two sites in the center of the Cwht1p catalytic domain: the proposed active site pocket (site A) and a nearby pocket, roughly 12 Å away (site B) (Fig.  4A). All top hits were found in site A. Within both sites, the ligands made polar contacts with several Cwht1p residues. Notably among the site A contacts, the ligands interact with two tryptophan residues (Trp 710 and Trp 391 ), and are blocking solvent accessibility to two others (Trp 715 and Trp 789 ); this result is consistent with our experimental tryptophan fluorescence data. The docked ligand molecules also interact with Asp 568 and/or Glu 771 (variable between poses), the proposed catalytic residues. Blocking substrate accessibility to these residues would certainly abrogate catalytic activity, and has been structurally seen in other glycoside hydrolases inhibited by monosaccharide analogs (47)(48)(49)(50). Site B does not possess any carboxylic acid residues necessary for glycoside hydrolysis, and so is unlikely to be the active site pocket for cleavage of the terminal glucose from the 14-mer oligosaccharide substrate.
Interestingly, the top binding mode found kojibiose with the non-reducing glucose in site B, distal to the catalytic residues in the active site pocket. Kojibiose makes several polar contacts in site B; this pocket contains few hydrophobic residues, and so could be involved in binding the hydroxyl groups of non-terminal residues in the 14-mer biological substrate, Glc 3 Man 9 GlcNAc 2 . Cwht1p mutagenesis experiments, kinetic evaluation, and co-crystalliza-tion are required to experimentally investigate the potential role of this pocket in inhibitor binding.
Glucotriose Structure in Determining the Substrate-binding Model-Based upon catalytic residue accessibility, GluI substrate selectivity, and glucotriose conformation, we evaluated the docked binding modes to propose a substrate binding model, as shown in Fig. 5. Subsite Ϫ1 is found under the loop between CH9 and CH10, with Glc1 stabilized by polar contact with Glu 707 and a stacking interaction with Tyr 709 . Subsite ϩ1 is the "active site pocket" where Glc2 makes specific polar contacts with Trp 391 , Asp 392 , and Trp 710 in this otherwise hydrophobic cleft. Subsite ϩ2 contains Glc3, under the loop containing helix CЈH1; Arg 428 makes a polar contact with the sugar. Another sugar-protein contact could take place following a small conformational change from this pose: residue Phe 444 is found in the CЈH1 loop, and could interact with the Tyr 709 -Glc1-Glc3 stacking. The anomeric carbon of this Glc3 points toward site B; the remainder of the 14-mer sugar could be found in this cleft or instead could protrude outwards from Cwht1p, making minimal protein contact. The latter is supported by the unchanged kinetic parameters between the trisaccharide and tetrasaccharide substrates (28,45).
Cwht1p is expected to be a suitable model for other eukaryotic homologs, all of which share similar substrate specificity to the glucotriose oligosaccharide containing ␣(132) and ␣(133) linkages. Conserved residues involved in the proposed model are listed in supplemental Table S3. The catalytic residues Asp 568 and Glu 771 are conserved across all species investigated; similarly, all residues interacting with Glc2 are conserved. This is not surprising, as this is the binding site found in many (␣/␣) 6 -barrel glycosidases with various oligosaccharide substrates, and the polar contacts here orient the sugar ring to place the anomeric carbon in place for glycoside bond cleavage by the catalytic residues (63)(64)(65). In prokaryotes, Tyr 709 is highly conserved, but the residues interacting with Glc3 are not; in the prokaryotic structures, the area analogous to site B is occluded. Therefore, the unknown sugar substrate for the pro- Proposed sugar subsites Ϫ1, ϩ1, and ϩ2 are circled in yellow, green, and blue, respectively; catalytic residues are shown in red. In this model, the bond between Glc1 and Glc2 (the Ϫ1 and ϩ1 subsites) is cleaved by Asp 568 and Glu 771 . B, polar contacts and stacking interactions between glucotriose and Cwht1p. Stacking interaction amino acids are shown in orange; proposed catalytic residues in red; polar contacts in red dashed lines. Glc3, Glc1, and Tyr 709 stack together in the docking results. Phe 444 is found on a flexible loop between helices CЈH1 and CH4 that could move to stack with Glc3.
karyotic GH 63 members does not bind in a similar fashion as Glc3 in Cwht1p. However, eukaryotic conservation at both Glc1 and Glc3 sites supports the model proposed above in recognition of the glycan substrate by the yeast enzyme.
Validation of in Silico Work-Despite the use of non-flexible structure docking, the calculated K i values for the single-residue ligands show reasonable agreement with the micromolar range of values determined experimentally, supporting the binding modes seen. Similarly, the kojibiose binding affinities (top mode 9.9 M) were only roughly within the same order of magnitude as those determined experimentally (55 M with membranebound Cwht1p (15)). However, in glucotriose docking, the calculated binding energy ranged from 52 to 71 M affinity. This is 20-fold smaller than the experimental K m value for the trisaccharide substrate with a hydrocarbon tail of 1.28 mM with Cwht1p (45). This deviation is not unexpected as the binding energies are more poorly predicted as the ligand size increases (66).
The precise structural location of inhibitors binding Cwht1p is not definitively known. However, the docked results here are in good agreement with single-ring ligands binding the active sites of other inverting glycosidases (47)(48)(49)(50). In addition, inhibitor soak damage of this crystal form, heavily dependent on active site contacts, supports their binding at the active site, in accordance with the docking results. These observations regarding the accuracy of binding modes and affinities are in accordance with what is seen in the literature. The scored affinities are not highly accurate in general; the strength of docking methods lies largely in the accurate predictions of the pose orientations, and less so in energetic calculations (66,67).
Conclusions-The structure presented here at 2-Å resolution, and its proposed substrate-binding model, establish the underlying basis for the high substrate specificity of eukaryotic GluI. Furthermore, these results demonstrate the use of in silico modeling as a method complementary to experimental work. The structure and model will inform further research into the relationship of GluI with its unique glucotriose substrate, and pave the way for investigation into structure-based drug design toward specific N-glycosylation inhibitors.