Crystal Structure of Human β-Galactosidase

Background: Deficiencies in β-d-galactosidase cause lysosomal storage diseases. Results: This is the first report to describe the crystal structure of human β-Gal. Human β-Gal is composed of a TIM barrel domain and two β-domains. Conclusion: The mutations were classified as mutations directly affecting the ligand recognition, mutations inside the protein core, or mutations located in the protein surface. Significance: Structural insights into lysosomal storage diseases mutations can be demonstrated. GM1 gangliosidosis and Morquio B are autosomal recessive lysosomal storage diseases associated with a neurodegenerative disorder or dwarfism and skeletal abnormalities, respectively. These diseases are caused by deficiencies in the lysosomal enzyme β-d-galactosidase (β-Gal), which lead to accumulations of the β-Gal substrates, GM1 ganglioside, and keratan sulfate. β-Gal is an exoglycosidase that catalyzes the hydrolysis of terminal β-linked galactose residues. This study shows the crystal structures of human β-Gal in complex with its catalytic product galactose or with its inhibitor 1-deoxygalactonojirimycin. Human β-Gal is composed of a catalytic TIM barrel domain followed by β-domain 1 and β-domain 2. To gain structural insight into the molecular defects of β-Gal in the above diseases, the disease-causing mutations were mapped onto the three-dimensional structure. Finally, the possible causes of the diseases are discussed.

skeletal abnormalities (7). G M1 gangliosidosis and Morquio B are rare disorders, and the estimated incidence of G M1 gangliosidosis is 1:100,000 -200,000 live births. On the other hand, the estimated incidence of Morquio B covers a wide range, from 1 case per 75,000 births in Northern Ireland to 1 case per 640,000 births in Western Australia (8).
The mechanisms by which these two disorders are caused have not been fully clarified. From patients with different ethnic backgrounds, a variety of mutations have been identified (9). ␤-Gal activity is almost lost in patients with severe infantile G M1 (type I) (Ͻ1% of normal), whereas in juvenile or adult G M1 patents (type II and III), residual activities of ␤-Gal were detectable but very weak (Ͻ9% of normal). This indicates that the residual enzyme activity is inversely correlated with the severity of the clinical types of this disease (6,10).
Human ␤-Gal cDNA encodes 677 amino acid residues, which include an N-terminal 23-amino acid secretion signal (11,12) (Fig. 1A). The enzyme is synthesized as an 88-kDa precursor and is transferred to the lysosomal compartment (13), where it is processed to the 64-kDa mature enzyme by proteolytic cleavage of the C-terminal region (14,15). The C-terminal fragment remains associated with the N-terminal fragment (16). Human ␤-Gal forms a multienzyme complex with protective protein/cathepsin A and neuraminidase. Complex formation with other proteins is important for the proper processing and activity of ␤-Gal (17).
Human ␤-Gal is classified into glycoside hydrolase (GH) family 35 based on its amino acid sequence similarities (18). It is predicted to have the TIM barrel domain, which is typical for GH family members (19,20). Human ␤-Gal is a retaining glycosidase in which the product retains the same stereochemistry as the starting substrate due to a double-displacement reaction mechanism whereby two consecutive nucleophilic attacks on the anomeric carbon lead to overall retention of the anomeric configuration (13). A pair of carboxylic acids is necessary for this reaction; one carboxylic acid acts as a catalytic nucleophile, and the other acts as an acid/base catalyst. In human ␤-Gal, Glu-268 has been identified as the catalytic nucleophile, and Glu-188 is a candidate for the acid/base catalyst (22). These residues are located in the fourth and seventh ␤-strand of the TIM barrel domain, respectively. For this reason, human ␤-Gal belongs to the 4/7 superfamily (23).

EXPERIMENTAL PROCEDURES
Protein Expression, Purification, and Crystallization-The details of protein expression, purification, and crystallization are published (32). Briefly, human ␤-Gal (residues 24 -677) was expressed in yeast Pichia pastoris and was purified to homogeneity. In the course of purification, polysaccharide moieties attached to the protein were trimmed off by endoglycosidase Hf treatment. For crystallization, ␤-Gal was subjected to limited proteolysis with bovine trypsin and was further purified. Crystallization experiments were performed with sitting-drop vapor-diffusion methods at 4°C. Crystals of the DGJ complex were obtained with the reservoir solution containing 20% (w/v) PEG3350, 0.2 M ammonium sulfate, and 100 mM Tris HCl (pH 8.0). To prepare crystals of the ␤-Gal⅐galactose complex, crystals of the ␤-Gal⅐DGJ complex were transferred to and incubated overnight in the mother solution (25% PEG3350, 0.2 M ammonium sulfate, 100 mM Tris HCl (pH 8.0)) supplemented with 200 mM galactose.
Data Collection and Structure Determination-Diffraction datasets were collected at beamline NW12A of the Photon Factory (Tsukuba, Japan) under cryogenic conditions at 95 K. The crystals of the galactose complex were cryoprotected with the mother solution supplemented with 200 mM galactose and 15% ethylene glycol. The crystals of the ␤-Gal⅐DGJ complex were cryoprotected with the mother solution supplemented with 1 mM DGJ and 15% ethylene glycol.
The datasets were processed with the HKL2000 package (33), and further analyses were carried out using the CCP4 suite (34). Structures of human ␤-Gal were determined with the molecular replacement method by using the program Molrep (35) and the coordinates of BT ␤-Gal (PDB code 3D3A) that has 32% sequence identity with human ␤-Gal. The structural models of the ␤-Gal⅐galactose complex and the ␤-Gal⅐DGJ complex were refined at a 1.8-Å resolution with stepwise cycles of a manual model-building using program COOT (36) and of restrained refinement using REFMAC (37) until the R factor was converged.
The final ␤-Gal⅐galactose complex structural model contains 4 ligand molecules, 4 chloride ions, 8 sulfate ions, 8 ethylene glycol molecules, 16 N-acetylglucosamine (GlcNAc) residues, and 2244 water molecules. The ␤-Gal⅐DGJ complex structural model contains the same nonprotein atoms as the ␤-Gal⅐galactose complex, except that it contains 2185 water molecules.
The qualities of the final models were evaluated with PROCHECK (38). The most favored and the additionally allowed regions in the Ramachandran plot were 95.0 and 4.2%, respectively, for the ␤-Gal⅐galactose complex. Residues in the disallowed regions included Lys-498 (chains A-D), Asp-508 (chains A-D), Cys-127 (chains A-D), Glu-186 (chains A-D), Asn-458 (chains A and B), and Ser-611 (chain A) in the ␤-Gal⅐galactose complex; these residues were well fitted to the electron density maps. The ␤-Gal⅐DGJ complex had nearly the same quality as the ␤-Gal⅐galactose complex. The refinement statistics are summarized in Table 1 (40). The accessible surface areas (ASAs) shown in Table 2 were calculated by using ArealMol in the CCP4 suite (34). The ASA values in Table 2 are for chain A of the ␤-Gal⅐galactose complex.  a Ligand, residues involved in ligand recognition; inter, protein core residues in the interdomain; intra, protein core residues in the intradomain; surface, residues in the protein surface.

RESULTS
Identification of Trypsinized ␤-Gal-After trypsinization of the recombinant ␤-Gal, we obtained 50-and 20-kDa fragments that correspond to the N-and C-terminal domains of ␤-Gal, respectively (32). We performed N-terminal peptide sequencing of these fragments. The N-terminal amino acid sequence of the 50-kDa fragment was Asp-Ala-Thr-Gln-Arg, indicating that the signal sequence of ␤-Gal was cleaved after Arg-25. The first amino acid should have been Asn, but it was probably not read correctly due to glycosylation. On the other hand, the N-terminal sequence of the 20-kDa fragment was Asp-Ser-Gly-His-His, indicating that the 20-kDa fragment started at Asp-531 and was cleaved after Arg-530. We also conducted mass spectrometry on the C-terminal 20-kDa fragment (Fig. 1B). The peak at m/z 15025.1 is in good agreement with the theoretical values of 15138.8 (residues 531-660), confirming that 17 residues from the C-terminal end of the 20-kDa fragment had been removed. We could not obtain mass spectrometry results for the 50-kDa fragment because of technical problems, but the 50-kDa fragment should be composed of residues 26 -530, because Arg-530 was visible in the electron density map. Taken together, the results indicate that the 50-and 20-kDa fragments corresponded to the N-terminal domain (residues 26 -530) and C-terminal domain (residues 531-660) of ␤-Gal, respectively. It has been reported that the precursor of human ␤-Gal is proteolytically cleaved between Ser-543-Ser-544 and that the large domain is further processed at Arg-530 (16). Thus, the trypsinized ␤-Gal used in this study closely corresponds to the mature form of ␤-Gal (Fig. 1A).
Structure of Human ␤-Gal-The deglycosylated and trypsinized forms of human ␤-Gal in complex with its inhibitor, DGJ, were crystallized in the monoclinic space group P2 1 . Initial crystals were obtained in the form of the DGJ complex with the cocystallization method. Crystals of the complex with galactose were obtained by replacement of DGJ with galactose by the soaking method.
All four molecules in the asymmetric units of the ␤-Gal⅐galactose and ␤-Gal⅐DGJ complexes were well superimposed, with root mean square deviation (r.m.s.d.) values ranging from 0.2 to 0.4 Å. The ␤-Gal⅐galactose complex showed a strong overall agreement with the ␤-Gal⅐DGJ complex, with r.m.s.d. values from 0.1 to 0.3 Å. Therefore, the structure of the ␤-Gal⅐galactose complex (chain A) is described throughout this report, unless otherwise stated.
The crystallographic asymmetric unit contained four ␤-Gal molecules (molecules A to D). Molecules A and B were composed of two polypeptide fragments of residues 29 -530 and 545-647. Molecules C and D were composed of two polypeptide fragments of residues 29 -527 and 545-647. Residues 531-544 were not visible in the electron density maps of the four ␤-Gal monomers in asymmetric units; therefore, these residues were not modeled.
The crystal structure showed that Arg-530 and Asn545 are separated by about 40 Å, and they point in opposite directions. Therefore, it seems that these residues were connected by a flexible loop before trypsinization. Residues after Ser-647 were not visible in the electron density map.
The overall structure of the ␤-Gal monomer was flat, with dimensions of ϳ75 ϫ 50 ϫ 30 Å. ␤-Gal was folded into three domains: the TIM barrel domain, also called (␣/␤) 8 (residues 1-359); ␤-domain 1 (residues 397-514); ␤-domain 2 (residues 545-647) (Fig. 1, A and C, and 2A). The TIM barrel domain formed an eight-stranded ␣/␤ barrel structure that is responsible for catalysis and is characteristic of the GH family (19). One galactose or DGJ was located in the bottom of the barrel for the ␤-Gal⅐galactose or ␤-Gal⅐DGJ complex, respectively. The ␤-domain 1 and ␤-domain 2 were composed of four-and sixstranded ␤ sheets, respectively. The TIM barrel domain and ␤-domain 1 were connected with a loop region of ϳ50 Å (residues 360 -396, hereafter referred to as the TIM-␤1 loop). This loop was across ␤-domain 2, and residues 372-384 in the loop region formed an additional ␤-strand with ␤-domain 2.
Human ␤-Gal possessed seven potential N-glycosylation sites (Asn-26, Asn-247, Asn-464, Asn-498, Asn-542, Asn-545, and Asn-555). Based on the electron density, four GlcNAcs for Asn-247, Asn-464, Asn-498, and Asn-555 were modeled in each ␤-Gal monomer. Asn-247 was located in the TIM barrel domain, Asn-464 and Asn-498 were in ␤-domain 1, and Asn-555 was in ␤-domain 2. All of these residues were found to point away from the ligand binding pocket. Therefore, the glycosylations presumably mediate the secretion and protection of ␤-Gal rather than its ligand binding. Asn-26 and Asn-542 were not included in the model due to a lack of electron density. The electron density around Asn-545 was found to be too poor to make a sugar chain. We were unable to determine whether these residues were glycosylated.
Based on the electron density, we modeled one chloride ion, two sulfate ions, and two ethylene glycol molecules in each ␤-Gal monomer. The chloride ion was located just beneath the ligand binding pocket and was surrounded by Tyr-83, Tyr-306, Arg-121, Gln-81, His-56, and Ser-54. This result suggests that this ion plays a structural role in forming the ligand binding pocket.
Dimeric Structure in Crystal-Four molecules in the asymmetric unit were attributable to two pairs of dimers (Fig. 2B). The dimerization of human ␤-Gal in solution was confirmed by gel-filtration chromatography (Fig. 2C). Fig. 2B shows that the two protomers of each dimer are related by a noncrystallographic 2-fold symmetry axis running parallel to the depth direction of the ␤-Gal monomer. The dimer had approximate dimensions of 75 ϫ 100 ϫ 30 Å. The dimerization interface was composed of three regions: residues 364 -374 in the TIM-␤1 loop, residues 559 -567 in ␤-domain 2 (␤2 loop), and residues 63-70 in the first helix in the TIM barrel domain (␣1). These three regions formed a concave surface in the lateral face of the protomer. The ␤2 loop of one protomer inserted into the dimerization interface of the other protomer through extensive van der Waals interactions and vice versa. The interface was rich in hydrophobic residues, especially Pro residues, including Pro-559, Pro-563, and Pro-566 in the ␤2 loop and Pro-364, Pro-366, Pro-367, and Pro-370 in the TIM-␤1 loop (Fig. 2D). The surface area in the interface was 904 Å 2 , which is slightly small for a dimerization interface.
Domain Organization of Human ␤-Gal-Among the reported crystal structures of ␤-Gal, BT ␤-Gal, Tr ␤-Gal, and Psp ␤-Gal belong to the GH35 family. Human ␤-Gal showed 32% sequence identity with BT ␤-Gal. The structure of human ␤-Gal resembled that of BT ␤-Gal (779 amino acids, monomer) (PDB code 3D3A), with an r.m.s.d. of 1.2 Å for 474 C␣ atoms. Both proteins showed similar domain organizations, with a TIM barrel domain and two ␤-domains. Psp ␤-Gal (1011 amino acids, monomer) and Tr ␤-Gal (1023 amino acids, monomer) had 23 and 16% overall sequence similarities with human ␤-Gal, respectively. Psp ␤-Gal and Tr ␤-Gal also showed domain organizations similar to that of human ␤-Gal but possessed two or three additional ␤-domains (29, 30) (Fig. 3) whose functional roles remain to be elucidated.
Other bacterial ␤-Gal structures differed considerably in their oligomerization state and domain organization, and their similarity with human ␤-Gal was limited mainly to the TIM barrel domain. C221 ␤-Gal (1,023 amino acids, hexamer) belongs to the GH2 family. Its active site opens to the central cavity of the hexamer and is connected by eight channels with exterior solvent (26). EC ␤-Gal (1,024 amino acids, tetramer) belongs to the GH2 family, and it is the most structurally and biochemically well characterized compound among the ␤-Gals. Each active site is complemented by a loop of neighboring molecules that is inserted into the active site (27). In human ␤-Gal, the loop region of ␤-domain 2 (residues 482-491) plays the same role as the complementation loop of EC ␤-Gal.
A4 ␤-Gal (645 amino acids, trimer) belongs to the GH42 family. Each monomer of A4 ␤-Gal has an active site located inside a large central tunnel formed by the interface of the trimer (28). The active site of human ␤-Gal is easily accessible from the bulk solvent, in contrast to the active sites of A4 ␤-Gal, C221 ␤-Gal, and EC ␤-Gal, which are segregated from the bulk solvent.
Ligand Recognition and Catalytic Mechanism-The catalytic residues, Glu-268 and Glu-188, were located in a deep well in the TIM barrel domain. The electron densities corresponding to the bound ligands in the ␤-Gal⅐galactose and ␤-Gal⅐DGJ complexes were clear enough to generate unambiguous models. Fig. 4A shows that a galactose molecule is bound to each monomer in the chair conformation, and its 1-OH group has the ␤-anomer configuration. All of the hydroxyl groups of the ligands made hydrogen bonds with ␤-Gal; 10 and 8 direct hydrogen bonds were formed between galactose and ␤-Gal and between DGJ and ␤-Gal, respectively (Fig. 4. A-D, supplemental Table S1.). Additional hydrogen bonds mediated by a couple of water molecules were formed between the ligands and ␤-Gal. Moreover, aromatic and hydrophobic residues in the active site contributed to ligand recognition via extensive van der Waals interactions (Fig. 4B). These interactions explain how ␤-Gal specifically recognizes the terminal galactose molecule.
In the ␤-Gal⅐galactose complex, the 1-OH group was hydrogen-bonded to the carboxylate group of Glu-188 and, concurrently, to a water molecule. Previously, Glu-268 was reported to be a nucleophile and Glu-188 to be an acid/base catalyst. The reactions proceed with a double-displacement reaction (22). In the crystal structure the configuration of the active site residues around the galactose was suitable for the nucleophilic attack of the Glu-268 carboxylate to the C1 position of galactose. The side chain of Glu-268 was oriented correctly against the substrate by forming hydrogen bonds to Tyr-270, Arg-121, and Tyr-83. In the course of the nucleophilic attack of Glu-268, Glu-188 acts as an acid catalyst. Subsequently, Glu-188 acts as a base, activating a water molecule that, in turn, attacks the C1 position of galactose and leaves the galactose residue.
Glycoside hydrolases are classified as retaining or inverting enzymes, depending on the conservation of the conformation of the anomeric position through the reaction (41). The difference between these classifications is determined by the distance between the oxygens of the two catalytic carboxylates; this distance ranges 4.5-6.5 Å in the retaining enzyme and 9.0 -9.5 Å in the inverting enzyme (42). The retaining enzyme forms a covalent intermediate with its substrate, whereas the inverting enzyme activates water molecules to hydrolyze the substrate. In the structure of human ␤-Gal, the average value of the distance for the four protomers was 5.1 Å, confirming that human ␤-Gal hydrolyzes its substrate in a retaining manner (13).

DISCUSSION
In this study, we determined the first crystal structure of human ␤-D-galactosidase. Our structural study revealed that human ␤-Gal showed domain organization that is distinct from those of previously reported ␤-Gal structures.
Homodimer of Human ␤-Gal-Several groups have reported that mature ␤-Gal isolated from mammalian tissues forms a dimer (43)(44)(45) or a monomer (46). ␤-Gal has also been isolated as a dimer or tetramer (16). These discrepancies could be due to  differences in the experimental procedures employed and in particular to differences in pH. Porcine ␤-Gal exhibits pH-dependent oligomerization; it adopts a monomeric form at neutral pH (pH 7.0) and reversibly associates to form a dimer at acidic pH (43). Considering that the optimal pH for activity is around 4.5, the active form is most probably a dimer. Norden et al. (46) reported that ␤-Gal is monomeric, but the experiment was performed at neutral pH.
From the structure, enzyme treatments such as with glycosidase and trypsin would be unlikely to affect oligomerization because the glycosylation sites and cleavage site is located far from the dimerization region. Taken together, these reports strongly suggest that ␤-Gal would form a dimer under physiological conditions, but a definitive demonstration of its oligomeric structure must await further investigation.
Substrate Binding Mode-Domain analysis showed that ␤-Gal has no known protein motifs other than the catalytic domain. A structural similarity search using DALI (47) revealed that both ␤-domains share structural homology with ␤-domains of other galactosidases with high homology scores (Ͼ7). Interestingly, ␤-domains show a weak homology to the carbohydrate binding module 35 (CBM35) (48). The comparison of ␤-domain 1 and CBM35 (PDB code 2W87) yielded a Z-score of 5.3, an r.m.s.d. value of 2.8 Å for 82/138 residues, and a sequence identity of 7%. The comparison of ␤-domain 2 and CBM35 (PDB code 2W87) yielded a Z-score of 4.9, an r.m.s.d value of 2.6 Å for 74/138 residues, and a sequence identity of 12%. In addition to the low homology, the topology of both structures is different, and the Trp residue that is important for ligand recognition in CBM35 is not conserved in the ␤-domains of human ␤-Gal. Therefore, the ␤-domains of human ␤-Gal are unlikely to have an affinity for sugar.
Mutations Causing G M1 Ganglioside and Morquio B Disease-Based on the structure, we can demonstrate structural insights into lysosomal storage diseases mutations. Nearly 100 gene mutations causing G M1 ganglioside and Morquio B disease are reported in the UniProt data base (40). We mapped these mutations onto the structure of human ␤-Gal to elucidate relationships between the location of the mutation and its effect on the activity of ␤-Gal (Fig. 5, A and B, Table 2). The mutations, which were scattered throughout the ␤-Gal structure, were classified as mutations directly affecting the ligand recognition, mutations inside the protein core, or mutations located in the protein surface (Fig. 5B).  (Fig. 6A). The side chain of Trp-273 was found to lie in the entrance of the ligand binding pocket, tuning the shape of the pocket and partially covering the pocket by interacting with the ligand. When Trp-273 was mutated to a residue with a smaller side chain, such as His, it could not effectively cover the ligand in the proper position. The side chain of Tyr-83 was almost buried in the bottom of the ligand binding pocket, with its phenol ring stacked on the side chains of Arg-121, His-56, and Ile-126. Moreover, the OH group of the side chain was hydrogenbonded with Arg-121, Ile-126O, Glu-268, Asn-187, and O3 of galactose. The side chain of Tyr-333 formed part of the lateral face of the ligand binding pocket, and its OH group interacted with O6 of galactose. Tyr-270 made a water-mediated interaction with O1 of galactose and also hydrogen-bonded with Glu-268, thereby fixing the catalytic carboxylate of Glu-268 to the appropriate position for the catalytic reactions. Collectively, mutations at these sites disrupted the shape of the ligand binding pocket. Such disruptions caused reduced affinity and/or catalytic efficiency, leading to dysfunction of the enzyme.
Mutations Inside Protein Core-Mutations inside the protein core, which account for 60% of all mutations, can be subdivided into mutations within the domain and mutations located in the domain interface.
Mutations of Arg-59 to Cys or His are known to cause the infantile forms of G M1 gangliosidosis. Arg-59 was almost totally buried inside the protein core, as shown by an ASA of 3.1 Å 2 near the bottom of the ligand binding pocket (7 Å to the ligand). Its side chain formed hydrogen bonds with His-56 and Asn-313 and with the carbonyl oxygen atoms of Ala-128 and Glu-129 (Fig. 6B). Moreover, the guanidino moiety of Arg-59 was stacked between the side chains of Trp-130 and Tyr-331. These interactions stabilized the structure of the ligand binding pocket. When mutated to Cys or His, which have smaller side chains, these stabilizing interactions were disrupted, and the shape of the ligand binding pocket was changed.
Arg-121 (R121S, infantile G M1 gangliosidosis) was also buried near the ligand binding pocket (5 Å to the ligand), where it played a structural role in maintaining the pocket. Arg-590 (R590C, infantile G M1 gangliosidosis, ASA of 5.4 Å 2 ) in ␤-domain 2 linked three domains of ␤-Gal by making contacts with Glu-478 in ␤-domain 1, Asn-318 in the TIM barrel domain, and the carbonyl oxygen atom of Asn-479 in ␤-domain 1 (Fig. 6C). Tyr-316, Asn-318, Thr-329, Arg-442, Arg-482, Lys-578, and Tyr-591 also made interdomain interactions. Therefore, when these residues were mutated, the appropriate domain organization was disrupted, which destabilized the mutant proteins and/or lost the surface to the neuraminidase/cathepsin to form multiprotein complex to transport properly to the lysosomes. These observations support the idea that the three-domain organization of human ␤-Gal is indispensable for it to exert its normal function in lysosomes.
Mutations Located in Protein Surface-In contrast to mutations in the protein core region, which cause large structural rearrangements of the protein, mutations located in the protein surface had milder effects on the protein; the effect of such mutations could be balanced by local rearrangement of the nearby residues. For example, Arg-201 (R201C, infantile G M1 gangliosidosis, ASA of 118.9 Å 2 ) was located on the lateral face of the TIM barrel domain, which is far from the ligand binding pocket (22 Å to the ligand). The side chain of Arg201 was completely exposed to the solvent and formed a salt-bridge to , which is a known cause of the Morquio B mutation (D198Y) (Fig. 6D). No large structural rearrangement occurred in either mutation except for loss of the salt bridge.
R208C is also a known G M1 gangliosidosis mutation. In this case, the side chain of Arg-208 hydrogen-bonded to the carbonyl oxygen atoms of Gly-212, Val-215, and two water molecules. These interactions stabilized the loop structure, connecting an ␣4 helix and ␤5 strand (Fig. 6D). Such muta- The residues associated with infantile, juvenile, and adult G M1 gangliosidosis (G M1 G1, G M1 G2, and G M1 G3, respectively) and with Morquio B are shown in orange, purple, blue, and yellow, respectively. The mutations that reportedly show more than one phenotype are colored as the most severe phenotype. The bound galactose molecules are shown in space-filling representation. B, the disease-related mutations are classified according to the location of the mutation. Mutations having a direct influence on ligand recognition are shown in red, mutations in the protein core in the domain interface are shown in light cyan, mutations in the protein core in the intradomain are shown in light green, and mutations in the protein surface are shown in light blue. tions in the solvent-exposed regions caused local structural rearrangements and changed the surface properties of the protein, such as the electrostatic potentials and interface to other proteins.
Rationale for Effect of Mutations-In general, we observed that the residues found in the G M1 gangliosidosis and Morquio B diseases do not tend to be very solvent-exposed. Most mutations in the protein core region would cause drastic effects on the overall protein structure and stability, especially for the TIM barrel domain that is responsible for catalysis. Indeed, infantile G M1 ganglioside mutations, which are mutations associated with the most detrimental phenotype, were concentrated in the protein core regions. Mutations associated with milder phenotypes, such as juvenile or adult G M1 gangliosidosis, tended to be exposed to the solvent. The distribution of the mutations associated with Morquio B disease were somewhat biased toward the vicinity of the ligand binding pocket and ␤-domain 2. However, the effects of some mutations with the most detrimental phenotype were difficult to explain because they were located on the surface and probably caused little structural change. With the structure of human ␤-Gal presented here, it may become possible to locate the region of the defect causing the development of disease symptoms and to predict, to some extent, the severity of the disease from the three-dimensional protein structure.