The three-dimensional structure of invertase (beta-fructosidase) from Thermotoga maritima reveals a bimodular arrangement and an evolutionary relationship between retaining and inverting glycosidases.

Thermotoga maritima invertase (beta-fructosidase) hydrolyzes sucrose to release fructose and glucose, which are major carbon and energy sources for both prokaryotes and eukaryotes. The name "invertase" was given to this enzyme over a century ago, because the 1:1 mixture of glucose and fructose that it produces was named "invert sugar." Despite its name, the enzyme operates with a mechanism leading to the retention of the anomeric configuration at the site of cleavage. The enzyme belongs to family GH32 of the sequence-based classification of glycosidases. The crystal structure, determined at 2-A resolution, reveals two modules, namely a five-bladed beta-propeller with structural similarity to the beta-propeller structures of glycosidase from families GH43 and GH68 connected to a beta-sandwich module. Three carboxylates at the bottom of a deep, negatively charged funnel-shaped depression of the beta-propeller are essential for catalysis and function as nucleophile, general acid, and transition state stabilizer, respectively. The catalytic machinery of invertase is perfectly superimposable to that of the enzymes of families GH43 and GH68. The variation in the position of the furanose ring at the site of cleavage explains the different mechanisms evident in families GH32 and GH68 (retaining) and GH43 (inverting) furanosidases.

Invertase, the ␤-D-fructofuranosidase (EC 3.2.1.26) that cleaves sucrose into fructose and glucose is one of the earliest discovered enzymes. It was isolated in the second half of the 19th century, and its name was coined because the enzyme produces "invert" sugar, which is a 1:1 mixture of dextrorotatory D-glucose and levorotatory D-fructose (1). Because of its chemical structure, sucrose can be cleaved by either ␣-glucosidase or ␤-fructofuranosidase activity. Koshland and Stein established that invertase is a ␤-fructofuranosidase by performing the reaction in 18 O-labeled water and determining the 18 O content of the products (2). The transfructosylation activity of invertase indicated that the enzyme operates with a molecular mechanism leading to overall retention of the anomeric configuration (2). The breakdown of sucrose is widely used as a carbon or energy source by bacteria, fungi, and plants. In plants, both glucose and fructose are implicated in the signaling pathways by which sucrose concentration functions as a key sensor of the nutritional status of plants, and, thus, invertase plays a fundamental role in controlling cell differentiation and development (3,4). Commercially, invertase is mainly used in the confectionery industry, where fructose is preferred over sucrose because of a sweeter taste and a lower propensity to crystallize.
Although animals, including man, display a strong preference for sucrose-containing diets, their genomes do not encode invertases. Instead, they use a different and unrelated enzyme, sucrose ␣-glucosidase (EC 3.2.1.48), to hydrolyze sucrose. The genomes of human gut microorganisms such as Bacteriodes thetaiotamicron (5) and Bifidobacterium longum (6) do possess invertase genes, demonstrating that these organisms benefit from the large intake of sucrose by humans.
Glycoside hydrolases or glycosidases are a widespread group of enzymes displaying a great variety of protein folds and substrate specificities. They share a common defining feature in two critically located acidic residues, which make up the catalytic machinery responsible for the cleavage of glycosidic bonds. These two invariant residues have been identified experimentally in yeast invertase as an aspartate located close to the N terminus acting as the nucleophile (8) and a glutamate acting as the general acid/base (9). The enzymatic hydrolysis of glycosidic bonds has two possible stereochemical outcomes, inversion or retention of the anomeric configuration. Invertase is a retaining enzyme (2). With no known exception to date, the molecular mechanism appears conserved among the members of a given sequence-based family (10,11). Sensitive sequence analyses coupled to structural comparisons have revealed significant similarities between representatives of different families, accompanied by a conservation of the catalytic machinery and of the stereochemical outcome of the reaction, reflecting ancient divergence from a common ancestor to acquire novel substrate specificities (12). The evolutionarily, structurally, and mechanistically related families were grouped together in higher hierarchical level termed "clans" (10).
Threading analyses and homology modeling have led to the prediction that, as a member of glycosidase family GH32, invertase would display a six-bladed ␤-propeller fold related to that of influenza virus neuraminidase (13). However, the recent report on the three-dimensional structure of the family GH68 levansucrase from Bacillus subtilis (14) revealed that it had a novel five-bladed propeller fold, which has only been described previously for tachylectin (15) and for the family GH43 ␣-L-arabinanase from Cellvibrio japonicus (16). Recent detailed sequence analyses have revealed the existence of sequence motifs conserved in the glycosidase families GH32, GH43, GH62, and GH68, suggesting a possible structural relationship between these families (17) despite the opposite mechanisms in GH32 and GH68 (retaining) and GH43 (inverting). It should be noted that, because of the rapid mutarotation of furanoses, it is very difficult to experimentally determine the stereochemical course of the reaction catalyzed by furanosidases such as the family GH43 ␣-L-arabinofuranosidases. Three independent reports have, however, concluded that family GH43 enzymes operate by an inverting mechanism (18 -20). The mechanism prevailing in family GH62 is not known.
After over a century of investigations and almost 40 years since the first crystal structure of a protein was solved, no three-dimensional structure of an invertase or of any member of glycosidase family GH32 has been reported. Here we report the three-dimensional crystal structure of Thermotoga maritima invertase. This thermostable enzyme has recently been biochemically characterized by Liebl et al. (21), who have determined that it liberates fructose from various substrates such as sucrose, raffinose, and inulin. The structure not only provides a template for all members of family GH32 (including invertases, inulinases, levanases, exo-inulinases, sucrose:sucrose 1-fructosyltransferases, and fructan:fructan 1-fructosyltransferases), but it also allows dissection of the exquisite details that distinguish retaining and inverting furanosidases with a perfectly superimposable catalytic machinery.

EXPERIMENTAL PROCEDURES
Protein Cloning, Expression, and Purification-Genomic DNA of T. maritima strain MSB8 (DSM 3109), kindly provided by Dr. Wolfgang Liebl (Georg-August-Universitä t, Göttingen, Germany), was used to amplify the invertase gene (GenBank TM accession number AAD36485). The Escherichia coli strains used were DH5␣ for cloning experiments and BL21pLysS for expression. Vector pDONR is from Invitrogen, whereas vector pDEST17O/I is the modified vector pDEST17 from Invitrogen by insertion of lacO and lacI, to prevent expression leakage.
The invertase gene was amplified using INV-F (5Ј-TTCAAGC-CGAATTATCACTT-3Ј) and INV-R (5Ј-TCACAACCATATGTTCTCGA-3Ј) primers containing recombination sequences for integration in Gate-way™ vectors. PCR was performed using 500 ng of total genomic DNA of T. maritima, 300 nM each primer, 1.2 mM dNTP, 2.5 units of Pfx polymerase (Invitrogen), 1ϫ Pfx buffer (Invitrogen), and 1 mM MgSO 4 . The amplification program was 94°C for 5 min followed by 30 cycles of 94°C for 45 s, 55°C for 30 s, and 68°C for 2 min. The amplification was completed with a final extension at 68°C for 10 min. The amplification product was purified by precipitation in 30% polyethylene glycol 8000 and 30 mM MgCl 2 and re-suspended in 50 l of TE buffer (10 mM Tris and 0.5 mM EDTA, pH 7.5). The PCR product was cloned in pDONR (Invitrogen) and then in pDEST17O/I vectors as described in the manual supplied by Invitrogen (22) to obtain the plasmid pINV.
A single colony of BL21 pLysS containing the pINV plasmid was used to inoculate 40 ml of TBAC (Terrific broth supplemented with 100 g/ml ampicillin and 34 g/ml chloramphenicol). The culture was incubated overnight at 37°C with constant shaking. This culture was used to inoculate 3 liters of TBAC. Incubation was done at 37°C with vigorous shaking (240 rpm), and 0.5 mM of isopropyl-1-thio-␤-D-galactopyranoside was added when A 600 reached 0.8. This induction was followed by another incubation at 37°C of 4 h. Cultures were pelleted and then re-suspended in 50 ml of lysis buffer (50 mM Tris pH 8, 150 mM NaCl, 10 mM imidazole, 1 mM EDTA, and 0.1% Triton X-100) containing 1 mM phenylmethylsulfonyl fluoride and 0.25 mg/ml lysozyme. This cell suspension was kept overnight at Ϫ80°C. After thawing, the lysate was supplemented with 10 g/ml DNase I and 20 mM MgSO 4 and then incubated at 37°C until it became fluid. The supernatant containing soluble proteins was separated from the pellet by centrifugation (20,000 ϫ g) for 30 min at 4°C.
The SeMet 1 protein was produced as follows. A single colony of BL21 pLysS containing the pINV plasmid was used to inoculate TBAC followed by overnight incubation at 37°C. This culture was washed several times to remove the traces of TBAC medium and then used to inoculate 2 liters of M9 medium (medium from Difco supplemented with 2 mM MgSO 4 , 0.36% glucose, 100 M CaCl 2 , 100 g/ml ampicillin, and 34 g/ml chloramphenicol). Incubation was performed at 37°C under vigorous shaking (240 rpm). When the A 600 reached 0.5, 1.5 mM L-lysine, 1.5 mM L-phenylalanine, 1.5 mM L-threonine, 0.8 mM of L-leucine, 0.8 mM L-isoleucine, 0.8 mM L-valine, and 0.5 mM seleno-L-methionine (final concentrations) were added. After 30 min of incubation at 37°C, 0.5 mM isopropyl-1-thio-␤-D-galactopyranoside was added to the culture. After induction, expression was followed by measuring A 600 until a value of 1.7 was reached. Culture lysis was done as described above.
In all cases, the supernatant of the 20,000 ϫ g centrifugation was filtered (Amicon, 0.2-m pore-sized membrane), and the invertase was then purified in two steps. First, nickel affinity chromatography was performed using buffers containing 50 mM Tris, pH 8, 200 mM NaCl, and 50 and 500 mM imidazole for the washing and elution steps, respectively. Subsequently, the protein was submitted to gel filtration on a Sephadex column (Amersham Biosciences). The fractions containing the protein were pooled and concentrated to 11 mg/ml for the native protein and 8 mg/ml for the SeMet protein over ultrafiltration styrene acrylonitrile membranes (Millipore; cut-off was 30 kDa).
To verify that the N-terminal His-tag did not influence the enzymatic activity of invertase, the hydrolysis of sucrose by the purified protein was monitored. The method employed was adapted from Kidby and Davidson (23) and consisted of the measurement of reducing sugars by ferricyanide. Invertase (200 M) was incubated at 75°C in 100 mM sodium acetate buffer (pH 5.5) and 120 mM sucrose, i.e. exactly the same conditions as those described by Liebl et al. (21). One hundred-microliter samples were taken at different times of incubation. The enzymatic reaction was revealed by mixing samples with 1 ml of reagent (1 mM K 3 Fe(CN) 6 , 130 mM Ca 2 O 3 , and 5 mM NaOH) and by heating the samples for 7 min at 95°C. The activity was monitored by the decrease of A 420 as a function of time and led to values (data not shown) very similar to those published by Liebl et al. (21).
Crystallization of Native and SeMet-substituted Proteins-Crystallization conditions were first investigated using two sparse matrix sampling kits (Molecular Dimensions and Stura Footprint). Optimized crystals of a suitable size were obtained by mixing 15% polyethylene glycol 1000, 150 mM Li 2 SO 4 , and 100 mM sodium citrate at pH 4.2 with 11 mg ml Ϫ1 native protein. Crystals grew within 3 days at 20°C by the vapor diffusion method. The conditions for the SeMet-substituted protein were 17% polyethylene glycol 1000, 50 mM Li 2 SO 4 , 1% isopropanol, and 100 mM sodium citrate buffer at pH 4.2. Here the drops were composed of 2 l of protein at a concentration of 8 mg ml Ϫ1 with 1 l of reservoir solution. Both the crystals of native and SeMet-substituted protein belonged to space group P2 1 with unit cell parameters a ϭ 94.2 Å, b ϭ 113.2 Å, c ϭ 129.6 Å, and ␤ ϭ 98.96°. The asymmetric unit contains six monomers giving a V M value of 2.2 Å Da Ϫ1 and 43% solvent content.
Data Collection and Phasing-Crystals were soaked in mother liquor supplemented by 15% glycerol (w/v) before flash freezing in a cryogenic nitrogen stream at 100 K. Diffraction data of native and SeMet-substituted protein crystals, both in space group P2 1 , were collected at the European Synchrotron Radiation Facility (ESRF, Grenoble, France) on beam lines ID14-EH2 and ID29 respectively ( Table I). The data on SeMet-substituted crystals were collected at the absorption peak ( ϭ 0.97904 Å) and phased using the SAD method. Forty of the 48 SeMet positions were determined by anomalous Patterson maps using the subroutine XPREP of the program package SHELX5.0 (24). The 40 sites were refined with SHARP (25), and the missing eight positions were found in the residual maps. The 48 selenium positions appeared to be arranged in a manner that suggested the presence of three dimers. Symmetry averaged initial phases (DMMULTI; Ref. 26), using two of the three dimer positions, were subsequently used as input for RE-SOLVE (27), which automatically constructed initial C ␣ tracing of all six monomers present in the asymmetric unit. The density modification step with RESOLVE also produced a A -weighted 2F o Ϫ F c map of excellent quality into which side chains were built manually with TURBO (28) (Fig. 1B) for one monomer. The relative positioning of all molecules within the asymmetric unit was then performed by molecular replacement (AmoRe; Ref. 29) using the first constructed monomer as search model.
Structure Refinement-The structural model of invertase was refined with REFMAC5 (30) with intermittent manual rebuilding and refining of individual B-factors after applying a TLS correction. Water molecules were added with ARP/wARP (30). The final model comprises 6ϫ protein residues 1-432 (2,592 amino acids), 12 SO 4 2Ϫ ions, one buffer molecule (sodium citrate), and a total of 1,754 water molecules, which led to R and R free values of 17.6 and 22%, respectively. A few residues lacked electron density and were therefore refined with occupations of 0.5. One short surface loop (residues 96 -100) was highly disordered and displayed only clear density in one of the six invertase molecules. Ramachandran statistics (PROCHECK) indicated that, for the overall structure of the six molecules present in the asymmetric unit, 87.1% of the atoms are in the most favored region, and 12.6% are in additionally allowed regions. Details of refinement statistics are summarized in Table I.

RESULTS AND DISCUSSION
Overall Fold-The crystal structure of the T. maritima invertase (residues 1-432) has been solved by SAD phasing of the SeMet-substituted protein at a maximal resolution of 2 Å. The SeMet-substituted as well as the native crystals belong to space group P2 1 with unit cell parameters a ϭ 94.2 Å, b ϭ 113.2 Å, c ϭ 129.6 Å, and ␤ ϭ 98.96°. The coordinates describing six copies of the invertase polypeptide chain and 1754 water molecules per asymmetric unit were refined to final R-and R freefactors of 17.6 and 22%. One molecule of invertase is composed of two individual modules, namely a five-bladed ␤-propeller (residues 1-295) catalytic module linked to a C-terminal ␤-sandwich module (residues 306 to 432) by a 10-residue linker (Fig. 1). The ensemble of six bi-modular molecules arrange into three individual dimers, displaying 2-fold symmetry each. The three dimers are not related by any point group symmetry but by non-symmetrical rotations and translations. The dimer arranges around a pseudo 2-fold axis, bringing the ␤-sandwich domain of monomer A in contact with the ␤-propeller domain of   2. Sequence alignment of a selection of family GH32 invertases. The sequences are identified as follows: Tmar_inv, T. maritima invertase (Swiss-Prot O33833); Ecol_inv, E. coli K12 invertase (Swiss-Prot P16553); Smut_inv, Streptococcus mutans GS-5 invertase (Swiss-Prot P13522); Zmai_inv, Zea mays invertase (Swiss-Prot O81189); Atha_inv, Arabidopsis thaliana Landsberg erecta (GenBank TM BAA89048.1); Scer_inv1, Saccharomyces cerevisiae invertase 1 (Swiss-Prot P10594); and Scer_inv4, S. cerevisiae invertase 4 (Swiss-Prot P10596). The boxes shaded in red are strictly conserved residues, whereas the boxes shaded in light blue concern highly similar sequence regions. The sequence numbering and secondary structure elements (the color codes of the secondary structure elements are the same as in Fig. 1) correspond to the sequence of T. maritima invertase. The highly conserved motifs A through F, as defined by Pons et al. (13), are highlighted by left and right arrows above the sequences. The alignment was produced with ClustalX (46), and the figure was produced with ALSCRIPT (47). monomer B and vice versa. Upon purification, the enzyme had a profile corresponding to a size of ϳ30 kDa. Nonspecific interaction of the enzyme with the Sephadex column may explain this elution behavior. The same behavior has already been observed previously (21), and, therefore, the oligomeric state cannot be determined by this method. Preliminary investigations by dynamic light scattering indicated that the T. maritima invertase is a monomer in solution (data not shown). Several oligomeric states have been reported for invertases of various sources (31,32). Yeast invertase displays a dimeric substructure that may form even larger oligomers upon mannose binding (33). The overall monomer structure of T. maritima invertase has an elliptical shape with approximate dimensions of 63 ϫ 43 ϫ 45 Å with a negatively charged surface depression at the center of the ␤-propeller.
The clearly defined electron density revealed two amino acids in conflict with the GenBank TM sequence (A108 3 V108 and V179 3 A179). Therefore, the nucleotide sequence was checked twice (amplification from genomic DNA and the expression clone), and the two single base differences (C323 3 T332 and T536 3 C536) were only detected for the expression clone. As a consequence, these mismatches are attributed to misincorporation by the polymerase Pfx. Nevertheless, activity tests (see "Experimental Procedures") indicated that these mutations do not affect the enzymatic activity.
A five-bladed ␤-propeller structure has first been reported for tachylectin (15) and was found more recently for the en-zymes ␣-L-arabinanase (16) and levansucrase (14) of the glycoside hydrolase families GH43 and GH68, respectively. Highly similar to the families GH43 and GH68 structures, the five ␤-sheets of invertase, labeled I-V (Fig. 1), adopt the classical "W" topology of four antiparallel ␤-strands. The N-terminal second strand lines the central cavity, and the C-terminal last strand is at the periphery, to which the ␤-sandwich module is connected by a short linker. Interestingly, and in contrast to levansucrase and ␣-L-arabinanase, the five bladed ␤-propeller of invertase does possess the short "molecular Velcro" that is typical of six-and seven-bladed ␤-propellers (15,34,35). The N-terminal first strand forms the outermost ␤-strand of the C-terminal blade V; however, only one hydrogen bond is formed across the sheet (Phe-8 O-Met-277 N, 2.88 Å). A similar short Velcro has also been observed in the six bladed ␤-propeller of Vibrio cholerae sialidase (36). As in all ␤-propeller structures, the ␤-strands forming the blades are strongly twisted, giving an angle of ϳ90°between the first and last ␤-strand of a blade. Insertions are common in this type of ␤-propellers, and, likewise, short stretches of 3 10 -helices are found inserted between several individual ␤-strands of the structure described here. They are, however, less extended than in the GH68 levansucrase, and from this perspective the ␤-propeller of invertase resembles more that of GH43 ␣-L-arabinanase.
The Catalytic Active Site-The catalytic active site is positioned at one end of the cavity at the center of the ␤-propeller with a funnel-like opening toward the molecular surface. It clearly has a pocket topology, which is fully consistent with the strict exo mode of action of the enzyme on the fructose polymer inulin (21). The three carboxylate groups of two aspartate (Asp-17 and Asp-138) residues and one glutamate (Glu-190) residue point to the center of the depression and generate a high negative charge at the active site. Reddy and Maley have shown that Asp-23 in yeast invertase (Asp-17 in T. maritima invertase) is the catalytic nucleophile (8), whereas Glu-204 (here Glu-190) is the general acid/base (9). In addition to the two regions containing the catalytic machinery, multiple sequence alignments of the GH32 family (Fig. 2) have revealed a number of other highly conserved amino acid stretches (13,37). The inspection of the three-dimensional structure allows us to define possible roles for these highly conserved residues. For the family GH68 levansucrase, the sucrose complex of an inactive mutant has also been reported (14). Because the catalytic modules of invertase and levansucrase are structurally related, the superimposition of invertase with the sucrose-containing complex of levansucrase (PDB identification code 1PT2) allows us, by similarity, to infer the position of a sucrose molecule and model it in the active site of invertase (Fig. 3A). The crystal structure of invertase revealed a glycerol molecule, present in the substrate binding site, that mimicked the O4Ј and O6Ј hydroxyl groups of the substrate fructose moiety (Fig. 3B), and this helped define the precise position of the modeled sucrose molecule. This model shows that the second strictly conserved aspartate residue in motif D (for motif definitions see Ref.  and Fig. 2), Asp-138 in T. maritima invertase, forms hydrogen bonds to O3 and O4 of the fructose unit, whereas the neighboring Arg-139 is hydrogen-bonded to the glucose O4. Apparently, the pair of strictly conserved residues, "RD," binds to characteristic hydroxyl groups of the substrate and, therefore, most likely plays a crucial role in substrate binding and recognition. Interestingly, the enzymes of family GH68, which hydrolyze the same substrates, also have the highly conserved motif "RDP," whereas GH43 and GH62, which have a structurally is interesting to note that enzymes of family GH-68 have an arginine replacing this cysteine, although they cleave highly similar substrates. The importance of these differences for binding, recognition, and catalysis will be investigated in the future by a study of inactivated invertase mutants in complex with oligosaccharides. See Table II for a comparison of hydrogen bonding and close contacts between modeled sucrose and invertase active site residues.
Structural Relationship to Families GH68 and GH43 Fivebladed ␤-Propellers-Based on detailed sequence analyses, a structural relationship between families GH32, GH43, GH62, and GH68 has been predicted (13,17). The common five-bladed ␤-propeller fold, recently revealed by the structure determinations of members of family GH68 (14) and GH43 (16), confirmed this structural relationship. The crystal structure of invertase now proves that the catalytic modules of family GH32 enzymes also display the same five-bladed ␤-propeller fold. The superimposition of the catalytic module of invertase onto the two other enzymes leads to an overall root mean square deviation of 3.24 Å for 306 C ␣ atoms in the case of the family GH43 ␣-L-arabinanase and 3 Å for 359 C ␣ atoms in the case of the family GH68 levansucrase. Whereas levansucrase and invertase both retain the anomeric configuration at the site of cleavage, ␣-L-arabinanase is an inverting enzyme (18 -20). The most widely accepted (and documented) view of the difference between the catalytic machineries of retaining and inverting glycosidases is that, in the former, the two catalytic amino acids are ϳ5.5 Å apart, and in the latter this distance is generally ϳ9 Å, with the exception of ␤-helical enzymes such as polygalacturonase or -carrageenase (38,39). Remarkably, the three invariant amino acids Asp-17, Asp-138, and Glu-190 in GH32, defined as the catalytic residues in each of the families GH32, GH68, and GH43, superimpose rather well in all three enzyme structures (Fig. 4A), showing that the relatedness is not solely with the fold but also with the catalytic machinery. The structural superposition shows that there is no difference in the distances of the catalytic residues relative to each other, as has generally been observed in inverting versus retaining glycoside hydrolases (10,12,40,41). Instead, it is the difference in the binding position of the sugar in the Ϫ1 subsite (subsite nomenclature of Davies et al.;Ref. 48) that makes the difference in the catalytic mechanism of invertase and levanase on the one hand and ␣-L-arabinanase on the other. The arrangement of the loops and residues surrounding the catalytic machinery in ␣-Larabinanase is such that the arabinosyl moiety in the Ϫ1 subsite is bound in a position almost perpendicular to the fructofuranosyl moiety in invertase and levanase. Consequent to this different binding, the nucleophilic residues are only ϳ3.6 Å from the sugar C1 atom in invertase and levanase (14), whereas the distance C1-Asp-38 in ␣-L-arabinanase is 6 Å, leaving room for a water molecule (16) (Fig. 4B). This different binding mode of the "glycone" part of the substrate fully explains the opposite stereochemical outcome of the reaction, despite a perfectly superimposable catalytic machinery.
The ␤-Sandwich Module-The C-terminal residues (from 306 to 432) of T. maritima invertase compose an individually folded ␤-sandwich consisting of two sheets of six ␤-strands. This module is connected to the catalytic module via a short, 10-residue-long linker region that is wrapped around the ␤-sandwich. Contrary to the catalytic module, which can be readily aligned with all other members of glycosidase family GH32, BLAST searches conducted with the C-terminal module of T. maritima invertase did not reveal a statistically significant sequence similarity with the equivalent regions in other family GH32 proteins. To detect possible relatedness beyond the detection level of BLAST, we have removed the easily identifiable catalytic domain region in all complete family GH32 members and constructed a sequence library with the remaining C-terminal regions. PSI-BLAST searches conducted starting with the C-terminal region of plant or fungal or bacterial family GH32 members picked the T. maritima C-terminal domain after a few iterations, indicating that all GH32 family members will also be appended to a ␤-sandwich domain, such as that of T. maritima invertase.
The alignment of this module with the programs DALI (42) and 3D-PSSM (43) onto other ␤-sandwich structures revealed structural similarities with the ␤-sandwich in galectins, the Charcot-Leyden crystal protein, carbohydrate binding modules (CBMs), and other more distant proteins like lectins and exotoxin A. The highest similarity is observed with the human galectin-3 (Protein Data Bank identification, 1A3K; DALI Zscore, 10.9; root mean square deviation for 127 C␣ is 2.4 Å) (Fig.  5) and with the Charcot-Leyden crystal protein (Protein Data Bank identification, 1CLC; DALI Z-score, 10.7; root mean square deviation for 132 C␣ is 2.6 Å), which has recently been found to be a maltose binding galectin (44). It is interesting to note that six-bladed ␤-propeller glycosidases such as Micromonospora viridifaciens and V. cholerae sialidases have also been found appended with lectin-like domains (36,45).
It has been observed that extracellular yeast invertase, a functionally active homodimer in solution, acquires maltose to self-assemble into higher oligomers upon transport and secretion (33). It is therefore tempting to postulate that the supplementary ␤-sandwich module of yeast invertase plays the role of a carbohydrate recognition domain involved in the higher oligomer formation. The distant similarity of the C-terminal module of T. maritima invertase, compared with the other members of the GH32 family, suggests that this module has perhaps lost this function in T. maritima invertase. Alternatively, this module might have evolved in T. maritima invertase to preserve stability at high temperature, even if the ancestral function of it has been lost. Proteins from hyperthermophilic organisms frequently adopt a modular as well as a multimeric structure. These two complementary features are thought to increase stability at high temperature by masking weak regions at the surface of the protein.