Association of Novel Domain in Active Site of Archaic Hyperthermophilic Maltogenic Amylase from Staphylothermus marinus*

Background: Maltogenic amylases that are known to date form dimers to perform hydrolysis. Results: The structure of maltogenic amylase from Staphylothermus showed a novel domain at the N terminus associated with the active site. Conclusion: Staphylothermus amylase has all of its substrate-binding structural components in a single monomer. Significance: This is the first report of the newly observed domain arrangement adopted by hyperthermophilic archaic maltogenic amylase. Staphylothermus marinus maltogenic amylase (SMMA) is a novel extreme thermophile maltogenic amylase with an optimal temperature of 100 °C, which hydrolyzes α-(1–4)-glycosyl linkages in cyclodextrins and in linear malto-oligosaccharides. This enzyme has a long N-terminal extension that is conserved among archaic hyperthermophilic amylases but is not found in other hydrolyzing enzymes from the glycoside hydrolase 13 family. The SMMA crystal structure revealed that the N-terminal extension forms an N′ domain that is similar to carbohydrate-binding module 48, with the strand-loop-strand region forming a part of the substrate binding pocket with several aromatic residues, including Phe-95, Phe-96, and Tyr-99. A structural comparison with conventional cyclodextrin-hydrolyzing enzymes revealed a striking resemblance between the SMMA N′ domain position and the dimeric N domain position in bacterial enzymes. This result suggests that extremophilic archaea that live at high temperatures may have adopted a novel domain arrangement that combines all of the substrate binding components within a monomeric subunit. The SMMA structure provides a molecular basis for the functional properties that are unique to hyperthermophile maltogenic amylases from archaea and that distinguish SMMA from moderate thermophilic or mesophilic bacterial enzymes.

Many organisms live in physically or geochemically extreme conditions that are detrimental to most organisms. Most eukaryotic organisms cannot tolerate temperatures higher than 50°C, due to the sensitivity of certain cellular components. The discovery of hyperthermophilic microorganisms living near, at, and above 100°C has revolutionized scientific thought in this area, and many scientists have become interested in such systems because they have developed specific physicochemical characteristics. Due to these particular properties, enzymes from extremophiles offer great potential for basic research and for biotechnological applications. For example, most industrial starch processes require high temperatures for liquefaction and saccharification. Therefore, there is enthusiastic interest in finding new sources of thermostable amylolytic enzymes.
Recently, we reported a novel maltogenic amylase from Staphylothermus marinus (1) that was isolated from geothermal sediments from a "black smoker" on the ocean floor (2). Maltogenic amylase is an enzyme that is widely used in the starch industry. This enzyme exhibits dual activity for ␣-D-1,4and ␣-D-1,6-glucosidic bond cleavage, which differs from the classic ␣-amylases in the glycoside hydrolase 13 (GH13) 5 family (3). SMMA has an optimal temperature of 100°C and a 109°C melting temperature with enzymatic activity under acidic conditions (pH 3.5-5.0), which is a favorable property for industrial applications (2,4). The conversion of starch at a high temperature and low pH offers several advantages, including higher substrate solubility, decreased viscosity, better bacterial decontamination, and increased reaction rates (4 -8).
Interestingly, the primary structure analysis of SMMA revealed that most thermostable maltogenic amylases from archaea, such as S. marinus (1), Thermofilum pendens Hrk5 (17), Thermoplasma volcanium GSS1 (18), and Pyrococcus furiosus (19), have a longer motif at the N-terminal region that is 220 -250 amino acids long. However, other bacterial maltogenic amylases and CD-hydrolyzing enzymes have N-terminal regions with only 120 -140 amino acids that in the CAZy (Carbohydrate-active Enzyme) server have been classified as the carbohydrate-binding module (CBM) family 34, based on the observation that this domain from the related ␣-amylase I (TVAI) in T. vulgaris binds carbohydrates (20,21).
Because it is the most thermostable maltogenic amylase yet reported, we studied the SMMA structure for detailed information about its function. In this study, we show the three-dimensional structure of the enzyme, which reveals a unique domain arrangement in the active site associated with the N-terminal region that distinguishes the archaeal maltogenic amylase from classic bacterial maltogenic amylases.
Crystallization and Data Collection-SMMA crystallization trials were conducted using the sitting drop method at 18°C. We mixed 1.5 l of a 14 mg/ml SMMA solution with an equal volume of crystallization reservoir solution containing 12% polyethylene glycol (PEG) 4000, 2% isopropyl alcohol, 0.1 M ADA, pH 6.5, and 0.1 M Li 2 SO 4 . Before data collection, rhombus-type crystals were cryocooled to 95 K using a cryoprotectant consisting of mother liquor supplemented with 25% glycerol. The crystal diffracted to a resolution of 2.28 Å, and the data were collected with a 1°rotation and a total of 340 frames.
Structure Determination and Refinement-Diffraction data were processed and scaled using HKL2000 (22). The structure was determined using the molecular replacement method with the Phaser CCP4 suite (23) and the neopullulanase from Bacillus stearothermophilus (Protein Data Bank entry 1J0H) with the N domain omitted. The resulting model was refined through model rebuilding using CNS (24). COOT (25) was used for stereographic manual refinement and model building. The structure was validated with PROCHECK (26). Structure-based sequence alignments were generated using ClustalW (27). Molecular images, including schematics and stick figures, were produced using PyMOL (28). The detailed statistics for data collection and refinement are listed in Table 1.
Molecular Modeling-The SMMA and ␤-cyclodextrin ligand complex model was constructed by overlaying the SMMA structure onto the ThMA complex structure (Protein Data Bank entry 1GVI). The substrate was optimized manually prior to energy minimization by using the steepest descent method with an 8-Å cut-off for 300 iterations using InsightII (Accerlys, San Diego, CA).

Ramachandran plot (%)
Most favored region 1293 (94.6%) Additionally allowed region 74 (5.4%) Outlier region 0 (0%) a Numbers in parentheses are statistics from the highest resolution shell. b R sym ϭ ͉I obs Ϫ I avg ͉/I obs , where I obs is the observed individual reflection, and I avg is the average over all symmetry equivalents.
where F o and F c are the observed and calculated structure factor amplitudes, respectively. R free was calculated using 5% of the data.
Kinetic Parameters and Reaction Product Analysis-The copper bicinchoninate method was used to measure the concentration of reducing products to determine activity and kinetic parameters of SMMA (15,32). For thin layer chromatography (TLC) analysis, the reaction products were spotted onto Whatman K5F silica gel plates (Whatman plc, Maidstone, UK) and developed using isopropyl alcohol/ethyl acetate/water (3:1:1, v/v/v) as the solvent system.

Analysis of Surface Tyr and Trp
Residues-Proximity to the end of a secondary structure element is defined as being within 4 amino acids for helix and within 1 amino acid for strand, where the termini are assigned from the Protein Data Bank (33). Structures used in the analysis are Protein Data Bank entries 1GVI (ThMA), 1J0H (NPase), 1JI2 (TVAII), and 1EA9 (CDase). Residues in the exposed surfaces were identified using the Areaimol program (34).

RESULTS
Overall Structure-The crystal structure revealed that SMMA comprises four domains: the N, catalytic, and C domains, which are observed in most CD-hydrolyzing enzymes, and an additional novel N-terminal domain, the NЈ domain, which was first observed in this study (Fig. 1a). Initially, the structure was determined and refined to a 2.28 Å resolution using molecular replacement, with the catalytic and C domains of neopullulanase from B. stearothermophilus (Protein Data Bank entry 1J0H) as the template structure. An attempt at molecular replacement with the entire three-domain region failed, which may have been due to the significantly altered orientation and geometry of the SMMA N domain (r.m.s. deviation of 2.3 Å for 463 C␣ atoms). During manual model building for the N-terminal region, the electron density of the region showed two vague, separate "blobs," which allowed the detection of the N (aa 116 -219) and NЈ (aa 1-115) domains of SMMA. The SMMA catalytic domain displays a conserved (␤/␣) 8-barrel fold with a distinct loop (aa 342-397) protruding from the barrel. Most CD-hydrolyzing enzymes have a protruding loop located near the active site, which is called the B domain and forms a portion of the substrate bind-FIGURE 1. Overall SMMA structure. a, a schematic overview of an SMMA monomer that shows the conserved N, catalytic, and C domains in CD-hydrolyzing enzymes with a novel NЈ domain. The monomer is colored in a spectrum; the N terminus is in blue, and the C terminus is in red. b, the SMMA dimer structure is shown with a 2-fold axis perpendicular to the plane. c, the dimeric interface between the NЈ domains is shown with the adjacent hydrophobic and charged interactions. d, a surface diagram of the monomer with a hypothetical cyclodextrin molecule (orange) is generated by superposition with the binary complex structure (Protein Data Bank entry 1GVI) to highlight the active site.
ing groove for subsites Ϫ2, Ϫ3, or Ϫ4. However, SMMA has a much longer insertion of aa 342-397 in this region, creating a larger domain at the entrance of the groove (Fig. 1b). In this groove, Tyr-389, which is in the middle of a helix in aa 385-391, forms an entrance gate with Tyr-257. Because SMMA and other CD-hydrolyzing enzymes from archaea have this insertion, the protruding region might serve as a signature that is unique to maltogenic amylases from archaea. Another sequence-structural feature is the presence of a glycine in the i ϩ 4 position after the catalytic nucleophile Asp-442 (Fig. 2) because typical maltogenic amylases (i.e. members of the neopullulanase subfamily) have a glutamate in that position as a part of the four-residue signature VANE (10).
A Dali search (35) using the SMMA structure identified the neopullulanase from B. stearothermophilus (29% identity) as its closest structural homologue, with a 2.6 Å r.m.s. deviation (536 C␣ atoms). The ␣ amylase II (TVAII) from T. vulgaris R-47 (30% identity) was identified as the second closest homologue with a 2.6 Å r.m.s. deviation (537 C␣ atoms). The loop regions comprising aa 236 -264 and 653-689 had high B-factors (61.3 average) at the surface, and the electron density map for the loop of aa 671-677 in the C domain was too weak to build a model, which suggests that it may be flexible. SMMA forms a homodimer via an interaction between the adjacent, novel NЈ domains, which have a 2-fold axis perpendicular to the arc shape of the ␤-strands' interface ( Fig. 1c). Each monomer is primarily associated through hydrophobic interactions at the center of the region of aa 5-19 (Ile-5 and -19 from one molecule against Ile-9 from the other). This interaction is supplemented by salt bridges (Arg-181/Asp-422 and Arg-50/Glu-198) at both ends of the strands, which yield a 2140.7 Å 2 interface (Fig. 1d). Most CD-hydrolyzing enzymes form dimers with the N domain intertwined. However, the SMMA dimer configuration is different from previously reported CD-hydrolyzing enzymes, in that the dimer is arranged with adjacent monomers and an interface unrelated to the active sites.
CBM48 Topology of the NЈ Domain-The long N-terminal region of SMMA includes two repeated motifs, the N and NЈ domains, with a ␤-sandwich fold (6.2 Å r.m.s. deviation for 12 C␣ atoms) (Fig. 3a). A structural homology search for the truncated NЈ domain generated the ␤-subunit of the 5Ј-AMP-activated protein kinase (AMPK) with a 1.9 Å r.m.s. deviation over 80 residues (Z score, 10.2) using the Dali and 1.71 Å over 80 residues (Z score, 6.2) using the SSM server (36). The AMPK ␤ subunit, which is known to bind glycogen, belongs to CBM48 (37). Currently, the CBM48 family has more than 3000 entries in the CAZy database, and 15 entries have structural information available (38). Despite the low overall sequence similarity, all of the structures superimposed well onto the SMMA NЈ domain and share 6 -8 ␤-strands from the ␤-sandwich fold, except for the long extended loop of aa 88 -110 between the 7th and 8th ␤ strands in SMMA (Fig. 3b). Eight of 15 structures have protruding loops that correspond to the loop of aa 88 -110 in SMMA, but the SMMA ␤-strand loop is much longer (8.2 Å) than the other corresponding loops. The AMPK ␤ subunit has a corresponding loop with the Leu-146 residue at the tip that interacts with cyclodextrin through several aromatic residues along the surfaces of the other ␤-strands to aid in the carbohydrate stacking interaction (37). In the SMMA NЈ domain, Phe-95, Phe-96, and Tyr-99 are along the loop, and numerous aromatic residues, such as Tyr-43, Phe-46, and Phe-77, lie on the ␤-sandwich fold surfaces (Fig. 3d). Interestingly, the SMMA NЈ domain is in the position of the N domain in other neopullulanase subunits, with its protruding loop being extended to the active site of the catalytic domain. With this orientation and geometry, the SMMA NЈ domain forms part of the active site pocket via residues Phe-95, Phe-96, and Tyr-99, which make up the substrate binding groove.
Shaping Active Site with Novel NЈ Domain-In SMMA, the substrate binding pocket was easily identified by the three highly conserved catalytic residues Asp-442, Glu-471, and Asp-536 at the bottom of the pocket (Fig. 4a). This active site pocket is located at the center of the (␤/␣)8 domain, is composed partly of the catalytic domain regions (residues 285-310, 397-417, 471-475, 534 -546, and 568 -594) and the protruding NЈ domain loop, and is 14.8 Å deep, 18.33 Å long, and 9.1 Å wide (Fig. 4b). The positions and orientations of the Asp-442 nucleophile and the Glu-471 acid/base are in appropriate ranges for retaining enzymatic activity because the carboxylate group of Asp-536 is held in position by a hydrogen bond to the nitrogen atoms (N ␦1 and N ⑀2 ) in His-535 (Fig. 4a). Similarly, the oxygen atoms (O ⑀1 and N ⑀2 ) in the Asp-442 carboxylate group are hydrogen-bonded to His-336 (N ⑀2 ), Tyr-296, and Arg-440 (NH1), and these positions may be responsible for maintaining the ionization state of the nucleophile. The pocket contains several aromatic residues; Phe-404 and Phe-405 form subsite Ϫ1, and Tyr-296 provides the essential stacking interaction with the sugar ring at subsite Ϫ1 that is conserved in GH13 and forms a hydrogen bond with the nucleophile Asp-442.
The active site pockets in bacterial CD-hydrolyzing enzymes are generated by the N domain from the other subunit and yield a groove that is slightly extended between the catalytic domain and the N domain, probably above subsites ϩ3 and ϩ4 (Fig. 4d). In comparison, the SMMA active site pocket is generated by the NЈ domain of the same subunit without a significant groove above the subsites ϩ3 and ϩ4, at which the position of the putative subsite ϩ3 is filled and blocked by Phe-405 and Phe-77 from the NЈ domain (Fig. 4c). In addition, the N domain in bacterial CD-hydrolyzing enzymes protrudes above the surface and forms a structural lid for the active site pocket, whereas the SMMA NЈ domain does not have the structural lid, which may explain the lack of SMMA transfer activity. Glu-332, which is known to be a key residue in ThMA transglycosylation activity, is replaced by Gly in SMMA (39). Because the active site pocket is significantly altered in SMMA, we have examined the degradation pattern with the long-chain substrates amylose and amylopectin. As shown in Fig. 5, SMMA hydrolyzes both amylose and amylopectin to produce maltose and glucose exclusively. A kinetic analysis in Table 2 showed that SMMA has lower K m value for amylose substrate (0.27) than for amylopectin (0.51), which exhibits a similar pattern to classic maltogenic amylases. However, K m values for both substrates were significantly lower than those of some maltogenic enzymes. For amylose substrate, it is 5-fold lower than that of CDase (0.27 versus 1.51) and 70-fold lower than that of ThMA (0.27 versus 19.7). For amylopectin, it is at least 2 orders of magnitude lower than CDase and ThMA (0.51 versus 55.15 and 0.51 versus 44.5, respectively), whereas the specificity constant k cat /K m is similar (1.17 versus 0.92 and 1.17 versus 3.03, respectively), which suggests that SMMA has a high affinity for polysaccharide substrates. SMMA showed a uniquely higher k cat value for ␥-CD substrate than for ␣-CD and ␤-CD.
SMMA Thermostability-The reported SMMA melting temperature is 109°C; thus, it is the most thermostable maltogenic amylase reported to date. Previously, the high frequency of surface hydrophobic residues with bulky side chains, such as Trp and Tyr in the ␣-helices and ␤-strand termini, were reported to be related to the thermostability of many proteins (33). Thus, we have investigated the distribution of Trp and Tyr in comparison with CDase, ThMa, neopullulanase, and TVAII. The structural analysis showed that a significantly high proportion of Tyr and Trp residues are located on the solvent-accessible surface throughout the entire molecule (Table 3). Although the total number of Tyr and Trp residues is not significantly higher than in the other enzymes, 67 (9.6%) compared with 44 (7.7%), these residues are primarily located on the surface in SMMA. Fig. 6 shows that the relative surface distribution of Tyr and Trp at the termini of the secondary structure is much wider than in those bacterial enzymes. Many extremophiles use sugars as a compatible solute to ease osmotic stress, and such solutes stabilize proteins, which may explain the abundance of aromatic residues that can bind carbohydrate sugars at the protein surface (33, 40 -42). Amino acid analysis showed that SMMA has a significantly higher frequency of Ile (74, 10.6%) than do the other four enzymes (average 24, 4.0%), but it has significantly lower Gln content (9, 1.3%). Ile occurs often in thermophilic proteins, and Gln is a labile amino acid that is deaminated at high temperatures, which suggests that known structural fac-

DISCUSSION
Certain CD-hydrolyzing amylases have much longer extended regions in the N terminus, despite their high sequence homology with the remaining classic CD-hydrolyzing enzymes. Thus, the functional role of this region in amylase activity is especially interesting. Given the high sequence homology of this extended region among archaic enzymes, it is likely that the CD-hydrolyzing enzymes from hyperthermophilic archaea, such as Staphylothermus hellenicus (46), Thermococcus barophilus (47), Thermococcus kodakarensis (48), Thermococcus onnurineus (49), T. pendens (50), T. volcanium GSS1 (18), and P. furiosus (51), have extra domains with the CBM48 fold.
CBMs have been divided into 64 families based on amino acid sequence similarity (38). Previously, CBMs were thought to be motifs that are functionally independent from the catalytic domains located distal to the active sites, and it has been proposed that they enhance the interaction between the carbohydrate substrate and protein (52)(53)(54)(55). For many amylolytic enzymes, CBMs were reportedly involved in starch hydrolysis by disrupting the starch granule structure, which allowed for a concentration of catalytic domains on the surface and carbohydrate starch hydrolysis by proximity (56 -59). For example, CBM20 in CGTases is involved in binding and guiding linear starch chains (60). Although the evolutionary history of CBM48 has been thoroughly elucidated, indicating that it reflects the evolution of specificities, rather than the evolution of species (61), the detailed function for CBM48 is unknown. The only exception is represented by the ␤-subunit of AMPK (37), which showed that CBM48 is a separate domain that binds cyclodextrin to increase its binding ability for the glucosyl polymeric structures commonly found in glycogen (62). Despite the wealth of structural information on CBM48 (36,52,55,(63)(64)(65)(66)(67) (for a review, see Ref. 68)), this paper is, to the best of our knowledge, the first to demonstrate that an independent CBM domain folds to interact with the catalytic domain and participates in substrate binding at the active site. While it is adjacent to the catalytic domain, CBM48 in the NЈ domain contacts the active site with the long extended loop that was previously shown to interact with cyclodextrin molecules in the AMPK ␤-subunit (37). The SMMA NЈ domain extended loop has several aromatic residues, which is a suitable architecture for a stacking interaction with carbohydrate sugar rings. The reduced K m values toward amylose and amylopectin reported in this study may reflect the characteristic binding of CBM to carbohydrates. For CBM48 function at the active site, it is related to substrate specificity in the glycogen debranching enzymes from Deinococcus geothermalis and Deinococcus radiodurans (69). The CBM48 determines the branching pattern of glycogen at the active site, which suggests a functional role similar to the SMMA NЈ domain. Thus, it would be interesting to relate the protein structure to its function (69).
Previously, we reported that substrate binding in ThMA was influenced by the direction and arrangement of several loops located in the N domain (15). The striking similarity of the position and orientation of the SMMA NЈ domain to the N domain in bacterial maltogenic amylases in this study indicates a common functional role in the substrate binding at the active site (Fig. 7). Interestingly, the SMMA N domain lacks the corresponding functional residues, such as Tyr-45 and Trp-47, which are involved in substrate recognition in ThMA (15). We suggest that the SMMA NЈ domain plays the functional role of the N domain in moderate thermophile or mesophile maltogenic amylases.
The interface between the NЈ domain and other domains in the SMMA monomeric subunit showed much stronger and tighter interactions than did the interface of the N domain in bacterial maltogenic amylases. The NЈ domain is primarily associated with a 1946.0 Å 2 buried interface via hydrophobic interactions, whereas the N domain in ThMA has a distribution of highly charged residues at the interface with a 1338.8-Å 2 buried surface. It has been reported that high salt concentrations change the oligomeric state of bacterial maltogenic amylases (70). Although oligomerization might not be necessary for the activity of the GH13 CD-hydrolyzing enzymes, it does contribute to their high specificity (71). The charged group in the hydrophobic interface may have direct effects on thermal resilience and serve as a doorway for water molecules. High temperatures would contribute to thermal destabilization by introducing water permeation, which would result in structure perturbation. Taken together, a modified structure that positions all of the components for substrate recognition within a monomer may have been adopted to allow the archaea hyperthermophilic maltogenic enzymes to retain activity at high  Residues at the termini of ␤-strands and ␣-helices are highlighted by cyan and magenta, respectively. The number of Tyr and Trp residues that are located at the termini of secondary structure elements and exposed at the surface is indicated.  a The temperature indicates the optimal temperatures for maximum enzyme activity. b The number of total Tyr and Trp residues. c The number of Tyr and Trp residues near the end of a secondary structure element. d The number of Tyr and Trp residues near the end of a secondary structure element and exposed at the surface. e The relative value of the surface number to the surface number of SMMA.
temperatures. The structural features of SMMA may also provide a molecular basis for engineering substrate preference and thermal stability in the starch-processing industry.