Structural insights into substrate specificity and function of glucodextranase.

A glucodextranase (iGDase) from Arthrobacter globiformis I42 hydrolyzes alpha-1,6-glucosidic linkages of dextran from the non-reducing end to produce beta-D-glucose via an inverting reaction mechanism and classified into the glycoside hydrolase family 15 (GH15). Here we cloned the iGDase gene and determined the crystal structures of iGDase of the unliganded form and the complex with acarbose at 2.42-A resolution. The structure of iGDase is composed of four domains N, A, B, and C. Domain A forms an (alpha/alpha)(6)-barrel structure and domain N consists of 17 antiparallel beta-strands, and both domains are conserved in bacterial glucoamylases (GAs) and appear to be mainly concerned with catalytic activity. The structure of iGDase complexed with acarbose revealed that the positions and orientations of the residues at subsites -1 and +1 are nearly identical between iGDase and GA; however, the residues corresponding to subsite 3, which form the entrance of the substrate binding pocket, and the position of the open space and constriction of iGDase are different from those of GAs. On the other hand, domains B and C are not found in the bacterial GAs. The primary structure of domain C is homologous with a surface layer homology domain of pullulanases, and the three-dimensional structure of domain C resembles the carbohydrate-binding domain of some glycohydrolases.

Arthrobacter globiformis I42, a Gram-positive bacterium, produces a glucodextranase (iGDase) as an exocellular enzyme (1). Although this enzyme hydrolyzes the ␣-1,4-glucosidic linkages of starch to produce ␤-D-glucose, its activity for the ␣-1,4-glucosidic linkages is much less efficient than that for the ␣-1,6glucosidic linkages of dextran and isomaltooligosaccharides (2,3). Oguma et al. (4) isolated a GDase from a different Arthrobacter strain, A. globiformis T-3044 (tGDase), for use in industrial production of cyclodextrans (cycloisomaltooligosaccharides) (4). The gene encoding tGDase has been cloned (it gives rise to a polypeptide consisting of 1023 amino acid residues in the mature form) (5), and the N-terminal portion of the primary structure of tGDase (residues 1-700) has been shown to have 38% identity to a glucoamylase (GA) from Clostridium sp. G0005. We also found that the C-terminal portion of tGDase (residues 700 -1023) is weakly homologous with a surface layer homology (SLH) domain conserved in pullulanase from Thermococcus hydrothermalis EM1 (6) and amylopullulanase from Pyrococcus abyssi. 2 It is considered that the bacterial SLH domain mediates the binding between exocellular proteins and the cell surface.
In the classification of glycoside hydrolase (GH) family members based on the amino acid sequences (8 -10), GDase is classified into GH family 15, whose major members are GAs (EC 3.2.1.3). GA hydrolyzes ␣-1,4-glucosidic linkages of starch, glycogen, and maltooligosaccharides to release ␤-D-glucose from the non-reducing end. Although GDase and GA cleave different types of glucosidic linkages, both enzymes invert the anomeric form of the substrate and liberate ␤-D-glucose with exo-type splitting. Other than GDase, an isomaltodextranase from A. globiformis T6 (11) and a dextran 1,6-␣-isomaltotriosidase from Brevibacterum fuscum var. dextranlyticum (12) have been reported as exo-type dextranases. These exodextranases have, however, entirely different primary structures from iGDase and are categorized into GH families 27 and 49, respectively.
Recently, x-ray structures of fungal and bacterial GAs have been reported. The fungal GAs possess a starch-binding domain connected at either the N (13) or C termini (14) of the catalytic domain, whereas the bacterial GAs do not have the starch-binding domain but have an extra domain at their Nterminal regions. In a bacterial GA from Thermoanaerobacterium thermosaccharolyticum, the catalytic domain is formed by an (␣/␣) 6 -barrel structure, and an extra domain of the N-terminal region is constructed of antiparallel ␤-sheets (15).
To clarify the molecular differences between GA and GDase, and to predict the function of the SLH domain of GDase, we cloned the gene for iGDase, which we describe here. We also crystallized iGDase using the native enzyme purified from the culture of A. globiformis I42, and present here the three-dimensional structure of iGDase at 2.42-Å resolution. To our knowledge, this is the first time that the three-dimensional structure of a full-length polypeptide containing the SLH domain has been analyzed. The structure of iGDase complexed with a pseudotetrasaccharide inhibitor, acarbose, has also been determined at 2.42-Å resolution.

EXPERIMENTAL PROCEDURES
Molecular Cloning and DNA Sequencing-The methods of molecular cloning and gene manipulations were based on those of Sambrook et al. (16). N-terminal and some internal amino acid sequences of iGDase were determined, and oligonucleotides encoding these sequences were prepared. PCR amplifications were carried out using several combinations of these oligonucleotides, but no positive clone was obtained, probably due to the high GC content of A. globiformis I42 genomic DNA (data not shown). While the procedure mentioned above was being carried out, the nucleotide sequence of the tGDase gene was reported. Part of the deduced amino acid sequence of tGDase, ADGSPWDGTS-VSRLW, contains a partial amino acid sequence of iGDase, ADGSPXD, and accordingly an oligonucleotide of the corresponding sequence of tGDase, 5Ј-CCA AAG CCG TCC AAC GCC GGT GCC GTC CCA CGG CGA ACC GTC NGC-3Ј, was synthesized and radiolabeled. SphI digests of A. globiformis I42 genomic DNA were ligated into pUC119, and the radiolabeled oligonucleotide described above was hybridized with the resultant genomic DNA library. A plasmid containing a fragment (4.1 kb), which encodes the N-terminal part of iGDase, was obtained and designated pG1D13. Gene walking was further carried out with a genomic DNA library constructed using SacI digests of A. globiformis I42 genomic DNA and pUC119. A fragment (4.5 kb), which encodes the C-terminal part of iGDase, was obtained, and the plasmid containing it was designated pG1D9. The nucleotide sequence of iGDase has been submitted to the DDBJ/EMBL/GenBank TM databases (accession number AB033333).
Purification, Crystallization, and Data Collection-Because the level of production of the recombinant iGDase in Escherichia coli was extremely low, iGDase was produced in the original strain A. globiformis I42 and purified as described (17). The crystals were grown at 20°C using the hanging drop, vapor diffusion method, where 1.5 l of an 8 mg/ml iGDase solution in 20 mM sodium acetate buffer (pH 6.0) con-taining 5 mM calcium acetate was mixed with an equal volume of a crystallization reservoir solution containing 3.0% (w/v) polyethylene glycol 8000, 80 mM potassium dihydrogen phosphate in 50 mM sodium acetate buffer (pH 5.1). To perform data collection at cryogenic temperatures, the crystals were immersed in a cryo-protectant solution consisting of the well solution with the addition of 30% (w/v) glycerol. The crystal of the complex of iGDase with the inhibitor was obtained by soaking with the same cryo-protectant solution containing 1.0 mM acarbose (Bayer AG, Germany). The diffraction data of unliganded iGDase and iGDase-inhibitor complex were collected at the beam line of BL38B1 of Spring8 and NW-12 of PF-AR, respectively. All data were processed and scaled using HKL2000 (18). Data collection statistics are summarized in Table I.
Structure Determination and Refinement-The structure of unliganded iGDase was solved by molecular replacement with program MOLREP in the CCP4 suite (19) and program CNS (20), and the structure of T. thermosaccharolyticum GA (PDB entry 1LF6) (13) was used as a search model. Although the whole structure of T. thermosaccharolyticum GA was initially used as the search model, reasonable phases were not obtained. Therefore, the structure of T. thermosaccharolyticum GA was divided into two models, namely, the parts containing domains N and A (residues 25-279, and 292-684), respectively, and these two models were used independently as the starting models (the detailed description of each domain is given under "Results"). After placement of domain A of the T. thermosaccharolyticum GA structure calculated with CNS, clear 2F o Ϫ F c electron density was observed. This structure was fixed and placement of domain N of the T. thermosaccharolyticum GA structure was further performed using program MOL-REP of CCP4. Several programs were tested for placing the search models, and the combination of programs described above gave the best results. All refinement cycles of the structure were carried out using the protocols of simulated annealing and minimization of coordinates and individual thermal parameters of the CNS program. After the initial refinement of the structure of domains N and A, the density modification protocol of CNS was applied, and adequate phases to all domains (N, A, B, and C) were clearly obtained. The chain trace toward the 2F o Ϫ F c electron density corresponding to domains B and C of iGDase, which were not found in T. thermosaccharolyticum GA, was implemented using the program ARP/wARP (21). Manual adjustment and rebuilding of the model were carried out with the program Xfit in the XtalView system (22). Solvent molecules were gradually introduced if the peaks above 4.0 in the F o Ϫ F c electron density map were in the range of a hydrogen bond. To avoid overfitting of the diffraction data, a free R-factor with 10% of the test set excluded from refinement was  monitored (23). Refinements of the final structure were converged at an R-factor of 0.193 (R free ϭ 0.227).
The structure of the iGDase-acarbose complex was solved by molecular replacement using the unliganded iGDase as the starting model. After simulated annealing refinement using CNS, difference Fourier maps clearly revealed density corresponding to three saccharide units. An acarbose molecule was identified with the valienamine ring in a 2 H 3 half-chair conformation and the other two sugar rings in the 4 C 1 chair form. Topology and parameter files for acarbose were gained from the HIC-Up data base (24). The structure of the iGDase-acarbose complex was finally refined with CNS (R-factor 0.196, R free 0.238). The figures were prepared using XtalView, PyMOL, Raster3D (25), or MolScript (26).
Protein Data Bank Accession Number-The atomic coordinates and structure factors of unliganded iGDase (code 1UG9) and acarbose complex (code 1ULV) have been deposited in the Protein Data Bank.

Comparison of the Primary Structures of iGDase and Related
Enzymes-The iGDase gene was cloned, and the primary structure was deduced. A homology search of iGDase with other proteins was implemented in the DDBJ data base using the BLAST program. A GDase from A. globiformis T-3044 (tGDase), a different strain from A. globiformis I42, was the most homologous protein (80% similarity) over their entire primary structures.
Apart from the homology to the GDase, the N-terminal and C-terminal parts of iGDase are individually homologous with different kinds of proteins. The primary structure of iGDase is divided into two regions based on the similarity of the primary structure; these regions are designated the N region (residues 1-689) and C region (residues 690 -1020) (Fig. 1). The N region shows high similarity to bacterial GAs from organisms such as T. thermosaccharolyticum (27) (36% identify and 52% similarity) and Clostridium sp. G0005 (28) (36% identify and 52% similarity). Although the similarity to eukaryotic GAs, for example GAs from Aspergillus awamori var. X-100 3 and Saccharomycopsis fibuligera (30), was low (less than 10%), both iGDase and eukaryotic GAs share the five conserved regions proposed for GA (31). On the other hand, the C region shows 29% identity to the S-layer homology (SLH) domain of the T. hydrothermalis pullulanase (6). The SLH domain has been reported to serve as a cell wall anchor, and has also been found in several other exocellular proteins (32).
Overall Structure of iGDase-The crystal structures of unliganded iGDase and its complex with acarbose have been determined. Both crystals belonged to space group C2 and contained one molecule in an asymmetric unit. In the final 2F o Ϫ F c electron density map of both structures (1 contoured), all amino acid residues were visible except for the N-terminal glutamic acid residue, although the N-terminal amino acid residue was determined to be glutamic acid using a protein sequencer, and 584 solvent water molecules and 6 calcium ions were well fitted. Ramachandran plots (33) calculated with the program PROCHECK (34) revealed that only one residue (Ala-838) was found in the disallowed region, yet electron density for this residue was well defined.
The whole structure of iGDase is shown in Fig. 2. iGDase is composed of four domains, N, A, B, and C. Domain N (residues 1-274), uniquely found in bacterial and archaeal glucoamylases and glucodextranases, is composed of 17 antiparallel ␤-strands. These ␤-strands are divided into two ␤-sheets, and one of the ␤-sheets is wrapped by an extended polypeptide that consists of the first 20 residues, which appear to stabilize domain N. Domain A (residues 275-685) is an (␣/␣) 6 -barrel structure that is common among the GH15 family enzymes, and two ␣-helices connect domains N and A by forming a linker region. Domain B (residues 686 -773) consists of antiparallel eight-strand ␤-sheets. Domain C (residues 774 -1020) is composed of 17 antiparallel ␤-strands forming three typical ␤-sheets, and from the first to the last strand in each sheet, the direction rotates by over 90°. The topology of the secondary structure elements of iGDase is described in Fig. 3. Six calcium atoms are found per iGDase molecule. Two, one, and three calcium ion-binding sites are located at domains N, A, and C, respectively. Although many interactions are observed between domains N and A, domains B and C are relatively isolated, and a significant curvature is located between domains A and B (Fig.  2c). A short linker, consisting of nine amino acid residues (residues 685-693, AGTPLSSPE), connects an ␣-helix (AH13) in domain A and a ␤-strand (BS1) in domain B, and contains two proline and one glycine residues. The hydrophobic interactions among Ala-468 at AH7, Ala-755 at BS7, and Val-765 at BS8 are also involved in maintaining the curvature of iGDase. The sequence from Pro-688 to Pro-692 appears to be inflexible and to fix the peptide chain, whereas Gly-686 may provide rotational flexibility as a hinge.
From the analysis of the primary structure and the x-ray crystallographic study, the function of iGDase protein seems to be divided into two parts for the N region (domains N and A) and C region (domains B and C). Structural homology searches for domains B and C were carried out separately using the DALI server (35) to predict the function of these two domains. Numerous homologous proteins were found (193 and 61 proteins and fragments were listed in the DALI results for domains B and C, respectively), but their specific functions have not been identified. Domain B shows remarkable homology with proteins containing the immunoglobulin-like fold, chitinase (PDB 1CTN) (36) and fibronectin fragment (PDB 1FNH) (37). Domain C shows homology with the carbohydrate-binding unit of some glycosidases, endo-1,4-␤-xylanase (PDB 1I82) (38) and exo-1,4-␤-D-glycanase (PDB 1EXG) (39). Although structures homologous to domains B and C were found within a single protein, ␤-galactosidase from Escherichia coli (PDB 1BGL) (40), the homology score was low. ␤-Galactosidase (1023 residues) folds into five sequential domains, 1-5, and domains 4 and 5 are homologous to domains B and C of iGDase. Both homologous domains of the two enzymes are located in the C-terminal region, but the configurations of the two domains are markedly different in each enzyme. Thus, distinct proteins homologous to domains B and C were found independently; however, no protein homologous to the sequential unit composed of domains B and C was observed.
Acarbose-complexed Structure-Acarbose is a pseudotetrasaccharide inhibitor that possesses the acarviosine unit at the non-reducing end. This inhibitor is an ␣-1,4-glucan mimic, although it was observed to bind iGDase. To facilitate descrip- tion of the complex, the four saccharide units are numbered as A-D, and the corresponding substrate binding sites are labeled as subsites Ϫ1 to ϩ3 (41) (Fig. 4a). The subsites are numbered from the non-reducing end of acarbose. After the initial refinement of the protein structure, the difference Fourier map indicated a clear continuous electron density for trisaccharide at the active site (Fig. 4b). The O6 hydroxyl groups were seen on the first and third saccharide units from the bottom of the substrate binding pocket, and not seen on the second unit. This finding suggests that the observed trisaccharide corresponds to units A-C of acarbose and the positions of subsites Ϫ1 to ϩ2 at the active site of iGDase were assigned. The electron density for unit D of acarbose was not identified in the difference Fourier map. The active site of iGDase is constructed of a shallow wide trough with a narrow hole at the bottom. The acarbose was taken up into the active site so as to protrude from the substrate binding pocket. The A and B units are located deep in the bottom of the pocket and bound with multiple hydrogen bonds, as mentioned below (Fig. 4c). In contrast, unit C is slightly outside of the pocket and interacts with only one residue (Gln-370). The average B-factor of unit C was 38.4 Å 2 , higher than that of unit A (26.5 Å 2 ) and unit B (27.0 Å 2 ). The electron density of unit D linked to a region further outside of unit C was disordered.
The overall conformation of iGDase complexed with acarbose was essentially identical with that of unliganded enzyme, although slight deviations were observed around the loop (residues 620 -652) near the catalytic site. The substrate binding pocket seemed to be roughly divided into two parts (Fig. 4b).
One side, which mainly consists of Arg-332, Gln-370, Gln-380, and a catalytic residue, Glu-430, participates in the hydrophilic interaction between the enzyme and acarbose. The other side, containing the other catalytic residue, Glu-628, is composed of four aromatic amino acid residues, Tyr-326, Tyr-573, Trp-655, and Trp-582, which appear to contribute to the formation of a large hydrophobic wall. The schematic binding model of iGDase with acarbose is shown in Fig. 4c. Unit A (valienamine moiety) is tightly bound to the enzyme by multiple hydrogen bonds. The O4 and O6 hydroxyl groups form hydrogen bonds with Asp-333 at distances of 2.7 and 2.9 Å, respectively. The O4 hydroxyl group also makes a hydrogen bond with Arg-332 at a distance of 2.8 Å, and the O3 hydroxyl group interacts with the side chain of Arg-332. Unit B (6-deoxyglucoside moiety) makes a stacking interaction with the aromatic ring of Tyr-573, which is further stabilized by hydrogen bonds between the O2 and O3 hydroxyl groups with Glu-431 and Arg-567 at distances of 2.6 Å and 2.4 Å, respectively. At unit C (glucose moiety), only one residue, the O6 hydroxyl group, makes a weak hydrogen bond with Gln-370 at a distance of 3.1 Å.

DISCUSSION
Comparison of Glucoamylase and iGDase-Unlike GDase, GA scarcely hydrolyzes dextran. However, structural similarity between GDase and GA has been found, and both enzymes belong to GH family 15, and thus the structure/function relationship of these enzymes is intriguing. Several crystal structures of GAs complexed with acarbose have already been reported. The structure of iGDase was superimposed on those of a prokaryotic GA from T. thermosaccharolyticum (13) and a eukaryotic GA from A. awamori var. X-100 (42) in this study (Fig. 5). The overall structures of domain A, where acarbose binds, of these three enzymes closely resemble each other. Also, the amino acid residues involved in subsites Ϫ1 and ϩ1 are highly conserved among these three enzymes.
Molecular surface models of the substrate binding pocket of the enzymes were compared (Fig. 6). At subsites Ϫ1 and ϩ1, the amino acid residues of iGDase adopt similar side-chain conformations to those of T. thermosaccharolyticum and A. awamori GAs. Unit A of acarbose and catalytic water fit into the bottom of the substrate binding pocket, and essentially no additional empty space is found in subsite Ϫ1, whereas the substrate binding pocket at subsite ϩ1 has a comparatively wider space for binding the substrates.
In contrast to subsites Ϫ1 and ϩ1, the amino acid residues at subsite ϩ2 are relatively poorly conserved. The residues in the vicinity of subsite ϩ2 form an entrance for the substrate binding pocket. Two residues located at this entrance, the positions equivalent to Gln-380 and Trp-582 of iGDase, are strikingly different among these three enzymes. Gln-380 of iGDase does not directly interact with acarbose (Figs. 4c and 6a), which gives a wide and shallow substrate binding pocket to iGDase, and this wide and shallow pocket may be favorable for accommodating the ␣-1,6-glucosidic linkage of dextran. In T. thermosaccharolyticum GA, Trp-390 is the counterpart of Gln-380 of iGDase, which contributes to forming subsite ϩ2 and interacts with unit C of acarbose (Fig. 6b). In A. awamori GA, an extended loop consisting of five amino acid residues (TGSWG) is observed in this region, and the sequence of the extended loop is also found in other eukaryotic GAs (Fig. 7). This loop protrudes from the entrance of the substrate binding pocket, and is bound to units C and D of acarbose (Fig. 6c). It was reported that the mutational studies of Trp-120 of A. awamori GA located in this extended loop influenced the affinity for isomaltose (43).
The residues at the position comparable to Trp-582 of iGDase, located at the opposite side of this entrance to Gln-380, are also not conserved among the above three enzymes. The plane of the side chain of Trp-582 is nearly perpendicular to the ring of unit C of acarbose. The distance between the CH3 atom of Trp-582 of iGDase and the O3 hydroxyl group of unit C is 3.0 Å. This bulky side chain of Trp-582 makes a constriction at the entrance of the substrate binding pocket (Fig. 6a), which might well influence the uptake of the substrate into this pocket. On the other hand, Tyr-590 of T. thermosaccharolyticum GA is identified as the corresponding residue of Trp-582 of iGDase. Because the side chain of the tyrosine residue is smaller (Fig.  6b), no interaction between this residue and acarbose is found, and the constriction of the substrate binding pocket is also small. In A. awamori GA, the corresponding region around Trp-582 of iGDase is completely lacking (Figs. 6c and 7). These observations show that, in iGDase, the region in the vicinity of Gln-380 is wide open, whereas in T. thermosaccharolyticum and A. awamori GAs, the opposite side to this region of the entrance of the substrate binding pocket is wide open. It is likely that the differences of these regions are responsible for the determination of the substrate specificities of these enzymes.
The enzymatic properties of some bacterial and fungal GAs and iGDase have been reported. Their specificities for large polysaccharides, starch and dextran, are strict, and essentially, bacterial and fungal GAs do not hydrolyze dextran while iGDase does not hydrolyze starch. However, bacterial GAs and iGDase have the ability to hydrolyze both ␣-1,4and ␣-1,6glucosidic linkages of small oligosaccharides like maltose and isomaltose, whereas fungal GAs scarcely hydrolyze isomaltose. The activity for isomaltose of T. thermosaccharolyticum GA was measured as 25% of that for maltose (44). In iGDase, the activity for maltose was estimated to be 10% of that for isomaltose (3). Also, in a bacterial GA derived from Clostridium sp. G0005, whose primary structure shows 95% homology with T. thermosaccharolyticum GA, the K m value for isomaltose was more than 20 times higher than those of fungal GAs (28). Because the structures of subsites Ϫ1 and ϩ1 located near the catalytic residues of bacterial and fungal GAs and iGDase are strikingly identical, one of the possible bases of the differences of their substrate specificities is the differences of the positions of the open space and the constrictions around subsite ϩ2 in their substrate binding pocket, which are organized by the corresponding residues in the vicinity of Gln-380 and Trp-582 of iGDase.
Functions of Domains B and C-Although the biochemical properties of domains B and C were not tested, we considered as one of the possibilities that these domains serve as cell wall anchors, because the primary structure of domain C is homologous to those of SLH domains derived from T. hydrothermalis EM1 pullulanase (6) and from Pyrococcus abyssi amylopullulanase (7). Other exocellular carbohydrate polymer-metabolizing enzymes such as Clostridium thermocellum xylanase (GenBank TM accession number M67438), Bacillus sp. strain KSM-635 alkaline cellulose (M27420) (45), Thermoanaerobacter saccharolyticum endoxylanase (M97882) (46), Bacillus sp. strain XAL601 ␣-amylase-pullulanase (D28467) (47), and Thermoanaerobacterium thermosulfurigenes EM1 pullulanase (48) also possess SLH domains. The composition of the domains of the enzymes mentioned above is, however, different from that of iGDase. Except for iGDase, these enzymes each have two or three SLH domains. Brechtel and Bahl (49) constructed a series of C-terminally truncated forms of T. thermosulfurigenes EM1 xylanase by removing the gene region encoding the SLH domains and showed that multiple SLH domains are necessary for the attachment of the xylanase to the cell wall of T. thermosulfurigenes EM1. In contrast, iGDase has a single SLH domain (i.e. domain C) and also a single domain B, instead of multiple copies of SLH domains.
It is likely that the function of the combination of single domains B and C in iGDase is equivalent to that of multiple SLH domains. What might be the role of domain B? The cell surface of Gram-positive bacteria is composed of the underlying peptidoglycan layer and the overlying S-layer. The S-layer has reported to be a regular array of proteins or glycoproteins forming two-dimensional lattices with hexagonal or tetragonal symmetry (50), so that the hydrophobicity of these S-layer proteins is generally high (ϳ40 -60 mol%) (7). The hydrophobicity of the primary structure of iGDase was calculated using a Protscale tool with the method of Kyte and Doolittle (29) at the ExPASy server, and the profile showed that domain B is more hydrophobic than any of the other domains (N, A, and C). These observations led us to conclude that domain B interacts with the S-layer by hydrophobic interactions. Domain B is buried in the S-layer, and the flexible linker located between domains A and B confers motion to the catalytic unit composed of domains N and A, which is capable of efficient hydrolysis of the substrates located close to the cell surface.