The Structure of an Archaeal β-Glucosaminidase Provides Insight into Glycoside Hydrolase Evolution*

The archaeal exo-β-d-glucosaminidase (GlmA) is a dimeric enzyme that hydrolyzes chitosan oligosaccharides into monomer glucosamines. GlmA is a member of the glycosidase hydrolase (GH)-A superfamily-subfamily 35 and is a novel enzyme in terms of its primary structure. Here, we present the crystal structure of GlmA in complex with glucosamine at 1.27 Å resolution. The structure reveals that a monomeric form of GlmA shares structural homology with GH42 β-galactosidases, whereas most of the spatial positions of the active site residues are identical to those of GH35 β-galactosidases. We found that upon dimerization, the active site of GlmA changes shape, enhancing its ability to hydrolyze the smaller substrate in a manner similar to that of homotrimeric GH42 β-galactosidase. However, GlmA can differentiate glucosamine from galactose based on one charged residue while using the “evolutionary heritage residue” it shares with GH35 β-galactosidase. Our study suggests that GH35 and GH42 β-galactosidases evolved by exploiting the structural features of GlmA.

Chitin is a polysaccharide consisting of ␤-1,4-linked Nacetylglucosamine (GlcNAc). It is a major constituent of fungal cell walls, the exoskeletons of insects, and the shells of crustaceans. Glucosamine (GlcN), 4 which is derived from the hydrolysis of deacetylated chitin (chitosan), has a variety of biological functions and, thus, has been used as a food additive and in medicines. Exo-␤-D-glucosaminidase (EC 3.2.1.165) catalyzes the hydrolysis of the ␤(1-4) linkage of chitosan oligosaccharides to remove a GlcN residue from the non-reducing termini, and retaining enzymes of this type are classified into glycosi-dase hydrolase (GH) subfamilies 2 and 35 (according to the Carbohydrate Active Enzymes (CAZy) database (1)). This enzyme is found in bacteria and archaea and has been thoroughly investigated because of its ability to produce monomeric GlcN.
The role of exo-␤-D-glucosaminidase in the chitin catabolic pathway of hyperthermophilic archaea has been defined. The degradation of chitin into diacetylchitobiose (GlcNAc) 2 is initiated by chitinase (ChiA) (EC 3.2.1. 14), and this product is then deacetylated at its nonreducing GlcNAc residue by deacetylase (Dac) (EC 3.5.1.-) (2). The resulting product, GlcN-GlcNAc, is subsequently hydrolyzed into GlcN and GlcNAc by exo-␤-D-glucosaminidase, and the remaining GlcNAc is further deacetylated to GlcN by Dac (2,3). To understand these enzymes' catalysis and adaptation to extreme high temperature, we had previously determined the structures of ChiA (4 -6) and Dac (7); however, the structure of exo-␤-D-glucosaminidase remained unknown. To date, two exo-␤-D-glucosaminidases from hyperthermophilic archaea, which are called GlmA, have been described: GlmA Tk from Thermococcus kodakaraensis KOD1 (3) and GlmA Ph from Pyrococcus horikoshii (8). The sequence identity between GlmA Tk and GlmA Ph is 63%, and both enzymes show the same substrate specificities and exist as dimers in solution, suggesting that their tertiary structures and catalytic mechanisms are probably identical.
GlmA belongs to the GH35 subfamily of the GH-A superfamily, which is the largest GH superfamily and contains 19 subfamilies. All members of this superfamily include a TIMbarrel fold as a catalytic domain that contains two carboxylic acids that function as an acid/base catalyst (9,10). Most characterized GH35 enzymes are ␤-galactosidases (EC 3.2.1.23), which hydrolyze the ␤(1-3) and ␤(1-4) galactosyl bonds in oligosaccharides. Interestingly, the sequence of GlmA has homology with parts of GH35 and GH42 ␤-galactosidases, although GlmA does not exhibit ␤-galactosidase activity (3). The highly conserved motifs around the catalytic residues of these ␤-galactosidases are not conserved in GlmA (3). Furthermore, GlmA Tk was found to exhibit weak ␤-glucosidase activity in addition to its major ␤-glucosaminidase activity (3). The only determined structure of an exo-␤-D-glucosaminidase among GH-As is that from the bacteria Amycolatopsis orientalis (CsxA), a member of the GH2 subclass (11). However, GlmA is distinct from CsxA in its substrate specificity and oligomerization state (3,11), and it shows low sequence similarity. These results suggest that GlmAs might have a unique active site structure that does not resemble that of CsxA.
To further clarify the existing knowledge regarding these enzymes, we determined the structures of GlmA Ph and GlmA Tk , which are the first reported archaeal exo-␤-D-glucosaminidase structures. The high-resolution structure of the product complex also reflects the unique structural features of GlmA that link the molecular evolution of GH35 and GH42 ␤-galactosidases in GH-A.

Results
Overall Structure-The structure of GlmA Ph was solved using the single-wavelength anomalous dispersion of selenomethionine atoms and refined at 2.6 Å resolution ( Table 1). The structure of GlmA Tk in complex with its reaction product, GlcN, was determined at 1.27 Å resolution using molecular replacement with the structure of the GlmA Ph monomer as the search model (Table 1). GlmA Ph and GlmA Tk showed almost identical tertiary structures, with a root mean square deviation (RMSD) of 0.90 Å for 775 C␣ atoms (Fig. 1A). Therefore, we describe the highest-resolution structure of GlmA Tk throughout this report, unless otherwise noted. In the structure of GlmA Tk , the asymmetric unit contains two polypeptides (chains A and B) corresponding to a non-crystallographic 2-fold axis (Fig. 1B). This dimer assembly is consistent with the data obtained from gel filtration chromatography, suggesting that the crystal structure of GlmA Tk corresponds to the biologically relevant form of the protein. The monomer structure of GlmA Tk contains three distinct domains (Fig. 1B). The N-terminal domain is a (␤/␣) 8 barrel, or TIM-barrel, domain (residues 1-435). It contains the catalytic machinery and is a common structure among glycoside hydrolases. The second domain is an ␣/␤ fold domain with central ␤-sheets and ␣-helices (residues 436 -648). The C-terminal domain is a ␤-fold domain with antiparallel ␤-sheets (residues 649 -786). These three domains of chains A and B form extensive interactions with all of the domains of the other chain. Because the fraction of buried atoms and their interactions (salt bridges, ion networks, and hydrogen bonds) represent features that might be important for thermostability (12), we explored the dimer interface of GlmA Tk using PISA (Protein Interfaces, Surfaces, Assemblies) (13) software. Our analysis showed that a large surface area of 5530 Å 2 is buried in the structure and that 29 hydrogen bonds and 16 salt bridges are created upon dimer formation. These findings indicate that both monomers are intimately associated, which might contribute to the high thermostability of GlmA Tk .
Structural Comparison with Other Glycoside Hydrolases-Structural similarity searches using the Dali server with the whole protein revealed that the dimer structure of GlmA Tk does not resemble those of any others. However, surprisingly, the monomer structure of GlmA Tk shares substantial similarity FIGURE 1. The overall structure of GlmA. A, the structural superposition of GlmA Ph (cyan) and GlmA Tk (orange) is shown as a ribbon diagram. B, the dimer structure of GlmA Tk is presented in two views. GlmA Tk consists of a homodimer (chains A and B) and comprises three distinct domains (TIM-barrel (red), ␣/␤ (blue), and ␤ (green)). The bound GlcN is shown as a sphere. Arrow, N-terminal ␤-sheet (residues 4 -13). with GH42 ␤-galactosidase, whereas the TIM-barrel domain is found to be more similar to the architecture of GH35 ␤-galactosidases rather than to that of GH42; this observation will be explained below. First, despite low sequence similarities (15-17%), the three domain structures of monomer GlmA Tk could be readily superimposed on four GH42 ␤-galactosidases with Z scores Ͼ25 and RMSD values of 2.6 -3.0 Å for equivalent C␣ atoms, except for 80 residues of the C-terminal region of GlmA Tk ( Fig. 2A). This result indicates that these ␤-galactosidases are spatially homologous to GlmA Tk at the level of individual domain folds and domain orientation, although their quaternary structures differ from that of GlmA Tk (Fig. 3A) and belong to different GH families within the GH-A family. In contrast, the residues in the TIM-barrel domain of GlmA Tk (residues 1-435) have the highest similarities with four GH35 ␤-galactosidases with Z scores Ͼ36, RMSD values of 1.8 -2.3 Å (Fig. 2B), and sequence identities of 22-32%; however, the overall structures of these ␤-galactosidases are distinct from that of GlmA Tk (Fig. 3B). For GH42 ␤-galactosidases, when only the TIM-barrel domain was compared, the RMSD values and the sequence identities were 2.3-2.9 Å and 13-18%, respectively. Based on these values, the TIM-barrel domain of GlmA Tk is more similar to that of GH35 than to that of GH42. As described below, the active site architecture of GlmA Tk bears a close resemblance to GH35 ␤-galactosidase. The unique structural features of GlmA Tk are similar to those reported previously; the amino acid sequence of the N-terminal and the central region of GlmA Tk showed homology to GH35 and GH42 ␤-galactosidases, respectively (3). From a structural perspective, these results also suggest that GlmA Tk is a common ancestor.
The Active Site Similarities between GlmA Tk and GH35 ␤-Galactosidase-The 1.27 Å resolution structure of the product complex reveals an unambiguous electron density for a sin-gle molecule of GlcN, which was observed in the deep pocket within the TIM-barrel domain of each monomer (Fig. 4A). To provide insight into the substrate recognition of GlmA Tk , we compared the TIM-barrel structure of GlmA Tk with that of the GH35 ␤-galactosidase from Trichoderma reesei (Tri-␤-gal) in complex with galactose (PDB entry 3OGR (14)) because its structure has the highest resolution (1.5 Å) among all GH35 enzymes with known structures whose functional residues in the active site are highly conserved and those with some conservative substitutions in the family (data not shown).
Interestingly, the superimposition of the TIM-barrel domain of GlmA Tk with Tri-␤-gal revealed a high degree of structural similarity between the Ϫ1 subsites of these proteins (Fig. 4B), although their substrates are different. Both the GlcN and galactose molecules adopt a chair conformation with their C1 hydroxyl group (O1) in the ␤-anomer configuration. They sit in almost the same position and form direct hydrogen bonds with eight residues. Four of the eight residues involved in direct substrate binding in GlmA Tk (Tyr 53 (interacting with O3), Glu 103 (O4, O6), Glu 179 (O1), and Glu 347 (N2)) could be superimposed onto Tyr 96 (O3), Glu 142 (O4, O6), Glu 200 (O1), and Glu 298 (O2) of Tri-␤-gal, respectively, resulting in almost identical proteincarbohydrate interactions with no substantial differences in the interatomic distances (Fig. 4, C and D). Gly 102 of GlmA Tk forms a hydrogen bond with O3 (2.9 Å) of GlcN via its main-chain amide, whereas the structurally equivalent residue of Tri-␤-gal is Ala 141 , which fulfills the same function through a hydrogen bond to O3 (2.9 Å) of galactose (Fig. 4, C and D). Therefore, this replacement is a conservative substitution. In addition, Trp 308 of GlmA Tk , which is involved in a hydrophobic stacking interaction with the planar face of the GlcN moiety, overlaps well with Tyr 260 of Tri-␤-gal (Fig. 4B). The component important for the recognition of the GlmA Tk substrate was not conserved in GH42 ␤-galactosidase (data not shown), despite the structural similarity of their monomers.
Among the substrate-binding residues described above, Glu 179 and Glu 347 of GlmA Tk are supposed to be catalytic residues, and the steric counterparts of Tr-␤-gal are the acid/base Glu 200 and the nucleophile Glu 298 , respectively (Fig. 4E). Glu 179 forms a hydrogen bond with O1 (2.7 Å) and is oriented toward the glycosidic oxygen, whereas Glu 347 forms a hydrogen bond with N2 (2.8 Å) and is positioned to serve as a catalytic nucleophile (Fig. 4, C and E). Consistent with the predicted roles of these residues, the mutation E347Q virtually inactivated the enzyme, whereas the mutation E179Q retained less than ϳ3% of residual hydrolysis activity (Fig. 5, A and B) ( Table 2). In the measurement performed here, acetate was produced as one of the reaction products (Fig. 5A) and could act as a nucleophile (15)(16)(17). Therefore, the retained activity of E179Q might be ascribed to the chemical rescue of acetate. In addition, Glu 179 and Glu 347 are located in the ␤4and ␤7-strands of the TIMbarrel domain, respectively, and the average distance between the oxygen atoms of these residues is 4.8 Å. This is consistent with the common structural features of the retaining enzyme in GH-A (9). Thus, these results strongly suggest that GlmA Tk hydrolyzes its substrate in a double displacement retaining mechanism using the acid/base residue Glu 179 and nucleophilic residue Glu 347 similarly to the GH35 enzymes characterized thus far. The catalytic center, which is almost entirely conserved, indicates the close evolutionary relationship between these enzymes.
Based on their sequence alignment (Fig. 2, A and B), the acid/ base residue Glu 179 of GlmA Tk aligns with those of GH35 and GH42 ␤-galactosidases. The nucleophile Glu 347 of GlmA Tk also aligns with those of GH42 ␤-galactosidases. However, it could not be aligned with those of GH35 ␤-galactosidases. Together with the fact that the highly conserved motifs around the catalytic residues of GH35 and GH42 ␤-galactosidases are not conserved in GlmA Tk (3), these results indicate that the locations of the catalytic residues predicted based on the sequence comparisons are uncertain and unreliable. Our structure determination of GlmA Tk combined with structure-guided mutagenesis studies facilitated identifying the catalytic residues accurately.
The Discrimination of GlcN from Galactose by GlmA Tk -Despite the high structural similarities at the active site, radical differences were observed in the rest of the substrate-binding residues of GlmA Tk : Asp 178 , Tyr 379 , and Glu 306 . The former two residues correlated with the chemical structure of GlcN. GlcN and galactose differ in terms of the substituent at C2 and the chirality of C4 and C6. The major difference is the substituent at C2, which is an amine group (N2) in GlcN and a hydroxyl group (O2) in galactose. Asn 199 of Tri-␤-gal, which precedes the acid/ base residue Glu 200 , forms a hydrogen bond (2.9 Å) with O2 (Fig. 4, D and F), and this Asn-Glu motif is highly conserved in GH35 and GH42 ␤-galactosidases (Fig. 2, A and B). The equivalent motif in GlmA Tk is Asp 178 -Glu 179 (the acid/base), and Asp 178 forms a hydrogen bond (2.7 Å) with N2 (Fig. 4, C and F). The pK a of N2 in GlcN is reported to be 7.4 (18); therefore, at pH 6.0, at which the activity of GlmA Tk is maximized (3), the N2 of GlcN will be in its protonated NH 3 ϩ form. Additionally, the side chain of Asp 178 will be negatively charged based on its average pK a of ϳ3.7. To confirm the importance of the acidic character of the carboxyl group of Asp 178 , we mutated Asp 178 to asparagine. As a result, the D178N mutant dramatically lost its catalytic activity (Fig. 5B) ( Table 2), implying that chargecharge complementarity is indispensable for the interaction between Asp 178 and the N2 of GlcN. The necessity of this interaction is also supported by a previous report indicating that

JOURNAL OF BIOLOGICAL CHEMISTRY 5001
GlmA Tk has very weak ␤-glucosidase activity (3). Chemically, glucose differs from GlcN only at the C2 of the pyranose ring, which contains a hydroxyl group (O2), indicating that the absence of a charged interaction between Asp 178 and O2 of glucose should cause a profound loss of ␤-glucosidase activity. Additionally, GlmA Tk could not hydrolyze (GlcNAc) 2 at all (3). GlcNAc also differs from GlcN only at the C2 substituent, which is replaced by a bulky acetoamido group; thus, the presence of GlcNAc at the Ϫ1 subsite cannot be tolerated because of its steric clash with Asp 178 . An additional difference between GlcN and galactose is the chirality of O4, which is equatorial in GlcN and axial in galactose. Tyr 379 of GlmA Tk forms a hydrogen bond (2.8 Å) with the equatorial O4 of GlcN (Fig. 4, C and G) and also serves as the lateral face of the hydrophobic pocket to accommodate GlcN. Surprisingly, Tyr 379 could be superimposed onto Tyr 342 of Tri-␤-gal (Figs. 4, B and G), which is a strictly conserved residue in GH35 ␤-galactosidases. Tyr 342 packs against the C4 atom of galactose in a similar manner as Tyr 379 of GlmA Tk ; however, its position is too distant (ϳ4.6 Å) to form a hydrogen bond with the axial O4 of galactose. Instead of Tyr 342 , Asn 140 of Tri-␤-gal is in a suitable position to form a hydrogen bond (2.8 Å) with the axial O4 (Fig. 4, D and G). Likewise, Asn 140 could be structurally superimposed onto Cys 101 of GlmA Tk (Fig. 4, B and G), and other GH35 ␤-galactosidases, such as BgaC and Hs-␤-gal, also have Cys residues at the same position. In BgaC, the counterpart Cys 96 forms a hydrogen bond with the axial O4 of galactose via its thiol group (19). Therefore, Cys 101 of GlmA Tk is supposed to be a conservative substitution. However, it is located 4.8 Å away from the equatorial O4 of GlcN, preventing hydrogen bond formation. Briefly, GlmA Tk and GH35 ␤-galactosidases possess residues with the potential to form hydrogen bonds with the axial and equatorial forms of O4 in the glycosidic substrate, respectively, thereby contributing to the recognition of GlcN or galactose. Therefore, we defined these Cys (Asn) and Tyr residues as "evolutionary heritage residues." To our knowledge, this is the first time such a heritage was seen in different functional glycoside hydrolases.
The other unique substrate-binding residue of GlmA Tk is Glu 306 , which forms a hydrogen bond (2.9 Å) with O1 (Fig. 4, C  and H) and is important for maximal catalytic activity; this was confirmed by determining that the mutation E306Q decreased the enzymatic activity ϳ2.5-fold (Fig. 5B) (Table 2). In Tri-␤-gal, Asp 258 is located at this position (Fig. 4, B and H); however, it cannot form a hydrogen bond with the O1 in galactose because of its side chain orientation (at a distance of ϳ4.6 Å). Because the configuration of O1 of GlcN and galactose is a ␤-anomer, Glu 306 may contribute to transition state stabilization rather than substrate recognition. In contrast, as described above, the O6 hydroxyl group of GlcN and galactose form hydrogen bonds with Glu 103 of GlmA Tk and Glu 142 of Tri-␤-gal, respectively, despite the different chirality of the O6 hydroxyl group (Fig. 4, C and D). However, Tyr 364 of Tri-␤-gal forms an additional hydrogen bond with the equatorial O6 of galactose (Fig. 4D), whereas the corresponding residue is absent in GlmA Tk . Tyr 364 is well conserved in other GH35 ␤-galactosidases (14, 19 -21), indicating that it contributes to the recognition of galactose, but not in a major way.
These data indeed suggest that Asp 178 , Glu 306 , and Tyr 379 of GlmA Tk play an important role in the recognition or stabilization of the GlcN molecule. However, as explained above, Asp 178 is supposed to be the most important residue responsible for the recognition of GlcN.
Additionally, the sequence alignment between GlmA Tk and GlmA Ph illustrated that the catalytic residues, together with the other key residues of GlmA Tk discussed above, were strictly conserved in GlmA Ph (data not shown) and are consisted with the mutagenesis analyses of GlmA Ph (Fig. 5C) (Table 2). These findings suggest that their catalytic mechanisms and substrate profiles are probably identical.
The Dimer Structure Influences Substrate Specificity-In the TIM-barrel domain, one of the ends of the ␤-barrel is closed by the N-terminal ␤-sheet (residues 4 -13) (Fig. 1B), whereas the other end, termed the "catalytic face" (22), is buried within a deep and narrow pocket upon dimer formation. The most notable feature of the dimer interface is that the 3 10 -helix of the ␣/␤-domain protrudes toward the catalytic face of the adjacent monomer, interacting via some hydrogen bonds and a salt bridge. These interactions involve Arg 563 , Asn 565 , and Arg 567 of the ␣/␤-domain and Asp 132 , Tyr 134 , Tyr 135 , and Gln 188 of the TIM-barrel domain (Fig. 6A). This interaction decreased the size of the active site entrance, and the ␣/␤-domain and the TIM-barrel domain of chains A and B create one large cavity, which abuts the active site pockets of both monomers (Fig. 6A). As previously stated, the opposite side of the catalytic face is closed; thus, this cavity is the only means of entry or egress for the substrate or product. The depth of the active site pocket from the center of the cavity is ϳ20 Å (Fig. 6B), which could restrict the access of lengthy substrates. Consistently, GlmA Tk showed higher activity against GlcN 2 (ϳ12 Å in length) than against longer molecules (N Ͼ2 ) (3). These results suggest that dimer formation is essential for GlmA Tk to exhibit enzymatic activity and to form an active site with an appropriate shape. Conversely, CsxA, which is the only other exo-␤-D-glucosaminidase with a known structure, can hydrolyze oligomeric substrates ranging from GlcN 2 to GlcN 6 with similar efficiencies (11). CsxA is a member of the GH2 family of the GH-A group and functions as a monomeric enzyme with its active site easily accessible to the solvent. However, its domain organization differs substantially from that of GlmA Tk (data not shown). This structural feature of CsxA is suitable for longer molecules. a The activities of the mutants are calculated from the relative peak I area against that of the wild type.

Discussion
The structure of GlmA Tk provides new insights into the structural composition and substrate recognition mechanisms of different enzymes and, thus, their molecular evolution. Briefly, a monomeric form of GlmA Tk shares substantial structural similarity with GH42 ␤-galactosidases, whereas a high number of conserved active site residues are shared with GH35 ␤-galactosidase, allowing GlmA Tk to discriminate glucosamine from galactose based on a subtle difference in the structure of ␤-galactosidase bound to galactose. Indeed, Asp 178 of GlmA Tk plays an essential role in the discrimination of GlcN from galactose, whereas the equivalent in GH35 ␤-galactosidase is an Asn residue. To the best of our knowledge, this is the first observation of such a high degree of conservation within the entire catalytic centers of different enzymes. In addition, the evolutionary heritage residues, which have the potential to form hydrogen bonds with the axial and equatorial forms of O4 in the glycosidic substrate, respectively, are an interesting finding that emphasizes the high evolutionary conservation of these enzymes. These structural features strongly suggest that GlmA is a common ancestor of these ␤-galactosidases, as discussed below.
The active sites of glycoside hydrolases are classified into three types: cleft type, tunnel type, and pocket type (23). Both GlmA and GH42 ␤-galactosidases have a cleft-type active site in their monomeric forms; however, the shape of the active site changes to a pocket type upon oligomerization, which can better accommodate smaller substrates (24) (Fig. 6). Thus, oligomerization is a key factor for size-based substrate specificity and the high stability of these proteins. ␤-Galactosidase may have evolved from a prototypical single TIM-barrel domain with a cleft-or tunnel-type active site, and then, during the subsequent process of modifying the active site to prefer a smaller substrate, extra domains were added to change the active site from a cleft to a pocket type (25). As described above, the monomer structure and a part of the sequence of GlmA show similarity to those of GH42 ␤-galactosidase, suggesting that GH42 ␤-galactosidase might have emerged from the evolutionary branch that originated from GlmA in the oldest organisms, archaea, and then differentiated into other members of the glycoside hydrolase family. Additionally, the frameworks of their monomer structure (i.e. the domain organization) might be suitable or necessary for oligomerization. However, the substrate-binding residues of GH42 enzymes are not conserved in GlmA (data not shown), excluding GlmA from being classified into the GH42 family, and the underlying evolutionary selection pressures that led to this diversity in the active site remain unknown. In contrast, the residues within the active site pocket are well conserved between GlmA and GH35 ␤-galactosidase, suggesting that GH35 ␤-galactosidase evolved from archaeal exo-␤-D-glucosaminidase through gene duplication. Enzyme substrate ambiguity is probably the starting point for the evolution of divergent enzymes through gene duplication (26,27). Consistently, GlmA Tk exhibits broad substrate specificity, showing weak hydrolytic activities toward various ␤-disaccharides (such as cellobiose and laminaribiose) in addition to its major ␤-glucosaminidase activity (3). Thus, the promiscuous activities of GlmA Tk might have developed through mutations that affected the subsequent functional adaptation of the newly emergent ␤-galactosidases (which favored ␤-galactoside) while retaining the original substrate-binding residues and the catalytic machinery. However, the highly conserved active site residues of both GlmA and GH35 ␤galactosidase indicate that there has been relatively weak evolutionary pressure on the catalytic center to convert the enzyme to perform different functions. As described before, GlmA and GH35 and GH42 ␤-galactosidases belong to the same GH-A "superfamily." A superfamily is a group that shows significant similarities in the tertiary structure together with conservation of the catalytic residues and mechanism, and its members are therefore considered to have a common ancestry (28). Accordingly, our finding that GlmA shares structural and mechanistic features with both the GH35 and GH42 ␤-galactosidases strongly suggested that GlmA is a common ancestor of these ␤-galactosidases.
Taken together, our results suggest that GH35 and GH42 ␤-galactosidases have evolved by taking advantage of the structural features of GlmA. The structural information reported here for GlmA could be used to design a new enzyme, such as a thermostable ␤-galactosidase or ␤-glucosidase, by subtly changing the active site residues in GlmA.

Experimental Procedures
Protein Expression and Purification-The genes encoding GlmA Ph (residues 1-778, GenBank TM accession number PH_RS02375) and GlmA Tk (residues 1-786, GenBank TM accession number AB100422) were codon-optimized for expression in E. coli and synthesized (Eurofins NMG Operon). The constructs for GlmA Ph and GlmA Tk were cloned into a pCold-II vector (Takara Bio) and a pET-32b vector (Merck Millipore), respectively, with an N-terminal PreScission protease cleavage site followed by a hexahistidine tag using the NdeI and EcoRI restriction sites. The resulting vectors were transformed into E. coli Rosetta (DE3)pLysS (Merck Millipore). For GlmA Ph expression, the cells were grown at 37°C in lysogeny broth (LB) medium until an A 600 nm of 0.5 was reached, and then the cultivation was continued for 24 h at 15°C. The overexpression of GlmA Tk was induced with 1.0 mM isopropyl-␤-D-galactosidase after an A 600 nm of 0.5 was reached, and the culture was incubated for 4 h at 37°C. The cells were harvested by centrifugation at 5,000 ϫ g for 15 min and stored at Ϫ20°C. For protein purification, the cells were disrupted by sonication in buffer A (50 mM Tris-HCl, 0.5 M NaCl, pH 8.0). The cell debris and insoluble proteins were removed by centrifugation at 15,000 ϫ g for 30 min after heat treatment at 80°C for 30 min. The supernatant was loaded onto a nickel-nitrilotriacetic acid (GE Healthcare) column equilibrated with buffer A, and the bound protein was eluted with buffer A containing 0.5 M imidazole. Subsequently, the eluted protein was dialyzed against 50 mM Tris-HCl (pH 8.0) in the presence of PreScission protease at 4°C overnight. The cleaved tag was removed with a second nickel-nitrilotriacetic acid purification, and the flow-through was further purified using a HiTrap Q HP column (GE Healthcare) with a linear gradient of 0 -1.0 M NaCl in 50 mM Tris-HCl (pH 8.0). The further purified protein was then subjected to a HiLoad 26/600 Superdex 200 preparation grade column (GE Healthcare) equilibrated with buffer (20 mM Tris-HCl, 150 mM NaCl, pH 8.0), concentrated to 10 mg/ml, and stored at Ϫ80°C. The point mutants of GlmA Ph and GlmA Tk were generated using site-directed mutagenesis. All mutant proteins were expressed and purified in the same manner as the wild-type proteins. The expression levels and isolated yields of these mutants were comparable with those of the wild-type enzyme.
Crystallography-After many crystallization trials of GlmA Ph and GlmA Tk , only GlmA Ph produced crystals. The crystals of GlmA Ph were grown at 20°C using the sitting drop vapor diffusion method in 100 mM Tris-HCl (pH 8.5) and 20% (w/v) polyethylene glycol (PEG) 1000 with a protein/reservoir volume ratio of 1:1. Crystals appeared within a week. Therefore, we prepared selenomethionine-substituted GlmA Ph according to the procedure described under "Protein Expression and Purification," except the cells were grown in LeMaster broth (29). The selenomethionine-incorporated protein was crystallized using the same conditions as for the native protein. Next, the co-crystallization of GlmA Ph and GlmA Tk with GlcN (10, 50, and 100 mM) was performed using commercial crystallization screens. However, we obtained crystals for only the GlmA Tk -GlcN complex, not the GlmA Ph -GlcN complex. Among the several sets of conditions attempted for GlmA Tk -GlcN complex crystallization, well diffracting crystals of the GlmA Tk -GlcN complex were obtained via sitting drop vapor diffusion at 20°C in 100 mM sodium propionate-sodium cacodylate-Bistris propane buffer (pH 7.0), 25% (w/v) PEG 1500, and 50 mM GlcN. Crystals of GlmA Ph (SeMet and native) and GlmA Tk -GlcN were cryoprotected with 15% (v/v) glycerol and 25% (w/v) PEG 400, respectively, and subsequently flash-frozen in liquid nitrogen. All data sets were collected on a BL44XU instrument at SPring-8 (Harima, Japan) with a MX300HE detector (Raynonix) under a cryostream at 90 K, and the data were processed and scaled using the HKL-2000 program suite (30). Data sets for native and SeMet GlmA Ph were collected at wavelengths of 0.9 and 0.97898 Å, respectively, with a single-wavelength anomalous dispersion of selenium atoms (peak). The initial phases of the GlmA Ph structure were determined at 3.5 Å resolution from the data of SeMet GlmA Ph using Phenix (31), and then the native data were phase-extended to 2.6 Å. The GlmA Ph model was built using Phenix. Further model building was performed with Coot (32). The structure was refined by CNS (Crystallography and Nuclear Magnetic Resonance (NMR) System) (33) and REFMAC (Refinement of Macromolecular Structures) (34) with rigid body refinement. Data sets of the GlmA Tk -GlcN complex were also collected at a wavelength of 0.9 Å. The initial phases of the GlmA Tk -GlcN complex model were solved using Phaser (35) with the structure of GlmA Ph as a search model , and then the model building and refinement and the addition of water molecules were performed using ARP/wARP (36). Further model building and refinements were performed using Coot and REFMAC with individual anisotropic B-factor value refinement, respectively. The final models of the GlmA Ph and GlmA Tk -GlcN complexes were validated by MolProbity (37).
Enzymatic Activities of GlmA Tk and GlmA Ph Mutants-The hydrolase activities of the GlmAs were detected using 1 H NMR spectroscopy. NMR experiments were conducted at 35°C on a Varian Inova (600 MHz) (Palo Alto, CA) equipped with a z-gradient, triple-resonance TR probe. The chemical shifts were referenced to an internal standard: 4,4-dimethyl-4-silapentane-1sulfonic acid. The substrate GlcN-GlcNAc for the GlmAs was prepared as follows. 1.6 mM (GlcNAc) 2 was incubated with 5 M Dac for 10 min in 50 mM potassium phosphate buffer (pH 6.0) containing 10% (w/w) D 2 O. After the first reaction was completed, the subsequent reaction was initiated by adding wild-type GlmAs or mutants to a final concentration of 5 M in the presence of Dac, and the spectra were collected after 150 min. One of the resulting products, GlcNAc, was immediately deacetylated to GlcN by Dac; thus, the NMR spectrum of the N-acetyl group of GlcN-GlcNAc was expected to disappear when the GlmA reaction was complete (Fig. 5A). The water signal was suppressed using the Watergate pulse sequence (38). The one-dimensional 1 H signals consisted of 8,192 sampling points covering a spectral width of 15 ppm. The relaxation delay was set at 1 s, and 64 scans were accumulated for each spectrum. This acquisition was repeated every 1.45 s. The chemical shifts were assigned as described previously (7).