Active Site and Laminarin Binding in Glycoside Hydrolase Family 55*

Background: SacteLam55A is a GH55 enzyme from highly cellulolytic Streptomyces sp. SirexAA-E. Results: Substrate-bound structures identify residues involved in binding, catalysis, enforcement of reaction specificity, and possibly processivity. Conclusion: Natural GH55 are exo-β-1,3-glucanases with a broad range of temperature and pH optima. Significance: Experimental annotation of GH phylogenetic space by use of bioinformatics, high throughput cell-free translation, biochemical assay, and structure determination is feasible. The Carbohydrate Active Enzyme (CAZy) database indicates that glycoside hydrolase family 55 (GH55) contains both endo- and exo-β-1,3-glucanases. The founding structure in the GH55 is PcLam55A from the white rot fungus Phanerochaete chrysosporium (Ishida, T., Fushinobu, S., Kawai, R., Kitaoka, M., Igarashi, K., and Samejima, M. (2009) Crystal structure of glycoside hydrolase family 55 β-1,3-glucanase from the basidiomycete Phanerochaete chrysosporium. J. Biol. Chem. 284, 10100–10109). Here, we present high resolution crystal structures of bacterial SacteLam55A from the highly cellulolytic Streptomyces sp. SirexAA-E with bound substrates and product. These structures, along with mutagenesis and kinetic studies, implicate Glu-502 as the catalytic acid (as proposed earlier for Glu-663 in PcLam55A) and a proton relay network of four residues in activating water as the nucleophile. Further, a set of conserved aromatic residues that define the active site apparently enforce an exo-glucanase reactivity as demonstrated by exhaustive hydrolysis reactions with purified laminarioligosaccharides. Two additional aromatic residues that line the substrate-binding channel show substrate-dependent conformational flexibility that may promote processive reactivity of the bound oligosaccharide in the bacterial enzymes. Gene synthesis carried out on ∼30% of the GH55 family gave 34 active enzymes (19% functional coverage of the nonredundant members of GH55). These active enzymes reacted with only laminarin from a panel of 10 different soluble and insoluble polysaccharides and displayed a broad range of specific activities and optima for pH and temperature. Application of this experimental method provides a new, systematic way to annotate glycoside hydrolase phylogenetic space for functional properties.

Microbial communities that have symbiotic relationships with biomass-harvesting insects are now recognized to be a rich source of microbes with diverse metabolic and biosynthetic capabilities (1,2). In addition to the well documented cellulolytic activity of microbes that inhabit the digestive tracks of termites (3), newer paradigms for insect/microbe symbiotic interactions have been recently described in leaf cutter ants (1,2,4,5), pine beetles (6), and wood wasps (7). For example, the highly invasive pinewood-boring wasp Sirex noctilio deposits a fungal/bacterial community into the pine tree when it lays eggs (8). If left unchecked, this insect/microbe community can cause massive destruction of pine forests (9). The white rot fungus Amelyosterum aerolatum was the originally identified member of this community (10). Recently, however, a highly cellulolytic and hemicellulolytic Actinomycete, Streptomyces sp. Sir-exAA-E (SirexAA-E), was isolated (8). Genomic, transcriptomic, and biochemical studies showed that this microbe secretes an array of glycoside hydrolases (GHs) 3 (7,11,12) and oxidative enzymes including AA10 lytic polysaccharide monooxygenases (6), peroxidases (7), and the first characterized caffeoyl-CoA intradiol dioxygenase (13) during growth on plant biomass.
The gene SACTE_4363, encoding a GH family 55 protein described hereafter as SacteLam55A, was of interest because it was up-regulated and secreted into the medium when Sir-exAA-E was grown on cellobiose, xylan, and various pretreated switchgrass samples but was not detected in the secreted proteome when SirexAA-E was grown on glucose (7). Furthermore, although there are two genes encoding predicted GH55s in the SirexAA-E genome (SACTE_0324 and SACTE_4363), only SACTE_4363 (encoding SacteLam55A) was up-regulated during growth on biomass.
In this work, we used a combination of gene synthesis, cellfree protein translation, catalytic assays, and x-ray crystallography to provide a correlated biochemical and structural characterization of the GH55 family. High resolution structures of substrate-and product-bound SacteLam55A revealed an extended substrate-binding cleft with six or more subsites for sugar binding formed at the interface of two right handed ␤-helical domains. Mutations guided by the structures support the identification of glutamate as the catalytic acid (Glu-502 in SacteLam55A) and a proton relay network that activates water to serve as the catalytic base. These structures also gave insight into residues that enforce an exolytic reaction, interactions that potentially support a processive hydrolysis of the bound oligosaccharide, and the ability of the enzyme to accommodate substrates with ␤-1,6 branching as observed in natural ␤-1,3-glucans from diverse sources.
Cloning, Expression, and Purification of SacteLam55A-The SACTE_4363 gene (Uniprot G2NFJ9 (28)) encodes a 605-residue polypeptide including a twin arginine translocation extracellular secretion signal peptide (29) and the catalytic portion of the protein. SACTE_4363 was cloned into the Escherichia coli expression vector pVP67K (Center for Eukaryotic Structural Genomics, Madison, WI) using a two-step PCR method. pVP67K, a derivative of pVP68K described previously (30 -32), produces a fusion protein that contains a tobacco etch virus protease cleavable N-terminal His 8 tag. Polymerase incomplete primer extension (33) primer pairs were designed to amplify the GH55 domain without the predicted signal peptide, designated here as SacteLam55A (residues 45-605). The amplified DNA also contained partial overlap with flanking sequences that are present in pVP67K to assist in cloning (30). The amplification reaction mixture was transformed into E. coli 10G competent cells (Lucigen, Middleton, WI), and the nucleotide sequences of several recovered SacteLam55A-pVP67K plasmids were verified at the University of Wisconsin Biotechnology Center DNA sequencing facility. A sequence-verified plasmid was used as the starting template to generate nine additional plasmids containing mutations in the SacteLam55A gene by PCR-based mutagenesis using primer pairs that contain either single or double nucleotide substitutions. The incorporation of the desired mutations was verified by nucleotide sequencing.
Sequence-verified plasmids were transformed into either E. coli B834 (selenomethionine labeling) or E. coli BL21 (DE3), respectively, for heterologous expression (31,34). A single colony from an overnight transformation was inoculated into 1 ml of chemically defined noninducing medium containing 50 mg/liter kanamycin and 34 mg/liter chloramphenicol for 8 h at 37°C (31). The 1-ml starter culture was inoculated into 50 ml of the same noninducing medium. After 16 h at 25°C, the 50-ml culture was transferred into 1 liter of auto-induction medium containing 50 mg/liter kanamycin and 34 mg/liter chloramphenicol and incubated for 25 h at 25°C. The autoinducing medium used to express selenomethionine-labeled SacteLam55A contained 0.01 g/ml of methionine (67 M) and 0.125 mg/ml of selenomethionine (637 M), whereas the medium used to express the unlabeled, mutated forms of SacteLam55A contained 0.2 mg/ml of methionine (1.34 mM). Cells were grown in the auto-inducing medium for 25 h at 25°C, harvested by centrifugation at 5,000 ϫ g for 15 min, and then stored at Ϫ80°C.
All mutated forms of SacteLam55A were purified using an identical protocol. Pelleted cells were resuspended in 100 mM MOPS, pH 7.5, containing 500 mM NaCl and 10 mM imidazole and lysed by sonication. The cell lysate was clarified by centrifugation at 20,000 ϫ g for 60 min and loaded onto a 1.6-cmdiameter ϫ 2.5-cm HisTrap HP column (GE Healthcare) equilibrated with 10 column volumes of 100 mM MOPS, pH 7.5, containing 500 mM NaCl and 10 mM imidazole. The proteins were eluted with 100 mM MOPS, pH 7.5, containing 500 mM NaCl and 500 mM imidazole. The most pure fractions, as judged by visual inspection of an SDS-PAGE gel, were pooled and concentrated using a Vivaspin 20 (Sartorius AG, Germany). Residual imidazole was removed by loading the concentrated sample onto a 2.6-cm-diameter ϫ 30-cm HisPrep desalting column (GE Healthcare) that had been equilibrated with 10 mM MOPS, pH 7.5, containing 50 mM NaCl. The N-terminal His 8 tag was removed by treatment with tobacco etch virus protease, and the cleaved tag, uncleaved fusion enzyme, and tobacco etch virus protease were separated by subtractive nickel affinity chromatography (32). Tag-free SacteLam55A was concentrated to 20 mg/ml by ultrafiltration and exchanged into 10 mM MOPS, pH 7.0, containing 50 mM NaCl. The protein concentration was determined from absorbance at 280 nm using a molar absorptivity calculated from the amino acid sequence of the purified enzyme (35).
Bioinformatics-GH55 sequences were obtained from the Carbohydrate-Active enZYme database (CAZy) (36). The GH55 catalytic domain was identified by alignment of the SacteLam55A and PcLam55A (20) structures. After the boundaries of the GH55 catalytic domain were identified, 177 nonredundant GH55 sequences were trimmed to contain only the catalytic domain, and phylogenetic trees were generated from amino acid sequence alignments using Clustal Omega and maximum likelihood methods (37). All percentages reported herein are relative to this nonredundant set of sequences.
Gene Synthesis and Cell-free Protein Translation-GH55 family members were selected from phylogenetic trees, and the protein sequences were reverse translated with the goals of optimizing codon usage for protein expression and removal of sequence features commonly associated with gene synthesis failure (low/high GC, homopolymers, repeats, hairpins, etc.). The optimized sequences were then partitioned into intermediate fragments of Ͻ1000 bp and broken into overlapping oligonucleotides using GeneDesign (38). Oligonucleotides (150mers on average) were then passed to an automated assembly pipeline that included pooling, dilution, and two rounds of polymerase chain assembly performed by acoustic deposition using the Echo 555 instrument (Labcyte, CA). The resulting intermediate fragments were verified for correct assembly and purified using Ranger technology (Coastal Genomics, Burnaby, Canada) and cloned into a standard entry vector. For every target DNA fragment, eight colonies were sequence-verified using PACBIO RSII (Pacific Biosciences, Menlo Park, CA) and analyzed for direction of insertion and mutations using an internal variant detection algorithm. For final gene assembly, the corresponding error-free fragments were PCR-amplified, purified, and assembled with pEU-JGI using Gibson assembly (39). pEU-JGI is a derivative of pEU-HSBC that contains SmaI restriction sites flanking the SgfI and PmeI delimited sacB/ chloramphenicol acetyltransferase selection cassette (40). The full-length genes underwent an additional round of sequence verification, and the correct clones were archived in 96-well plates ready for DNA purification and cell-free translation. Sequence-verified plasmids were purified as previously described, and the purified plasmids were adjusted to 1 g/l for use in coupled cell-free transcription and translation reac-tions (40). Translation reactions were analyzed by SDS-PAGE, and the concentration of each translated protein was quantitated using Gel Dock (Bio-Rad).
Enzyme Assays with Purified Enzyme-Purified SacteLam55A and mutated forms of the enzyme were screened for activity using the DNS assay with the substrates ␤-1,3-glucan, ␤-1,6glucan, lichenan, and laminarin. The standard assay was performed in 100 l of 50 mM phosphate buffer, pH 6.0, containing 10 mg/ml of the substrate at 50°C. The reaction was initiated by addition of ϳ0.1 g of purified enzyme and was incubated for up to 1 h at 50°C. Reducing end products were determined using the dinitrosalicylic acid (DNS) assay (40,41). After the desired time of reaction, 1 volume of the assay mixture was added to 2 volumes of DNS reagent, mixed, and then heated at 95°C for 5 min. The amount of released reducing sugar was determined by monitoring the absorbance at 540 nm using glucose as a standard.
Steady-state kinetic parameters were estimated using varied amounts of soluble laminarin as the substrate (0, 1, 2, 5, 10, 15, and 20 mg/ml). In these reactions, ϳ0.02-0.1 g of enzyme was mixed with varied amounts of laminarin up to 20 mg/ml in 100 l of 50 mM phosphate buffer, pH 6.0, at 50°C. Samples were taken from the assay mixture at 0, 1, 2, 4, 8, and 10 min to estimate the initial rate for reducing sugar formation, v, at each substrate concentration and to assess the time period over which the reaction rate was linear. A blank containing no enzyme was also prepared and used as a correction for the absorption of the DNS reagent and nonenzymatic hydrolysis. The initial rate, v, was determined from 2-min reactions for SacteLam55A carried out in triplicate, and reactions up to 10 min in duration were carried out in triplicate for mutated forms of the enzyme. In the reaction conditions described above, ϳ2-10% of the total substrate was converted to reducing sugar products depending on the activity of the enzyme. For the lowest activity mutated forms of SacteLam55A, the concentration of enzyme used in the reaction was increased by ϳ5-fold relative to SacteLam55A. Kinetic parameters were estimated by nonlinear least squares fitting to the expression v ϭ V max [S]/ (K m ϩ [S]) using Prism 6.0 (GraphPad, La Jolla, CA).
End products (glucose and L2) from complete hydrolysis of pure laminarioligosaccharides L2-L5 were separated by HPLC using a Rezex RPM-oligosaccharide column (200 ϫ 10 mm) (Phenomenex, Torrance, CA) and detected by change in refractive index. Degassed Milli-Q water was used as the mobile phase with a flow rate of 0.3 ml/min at 85°C. The area of each product peak was obtained by integration using EZstart software (Shimadzu Scientific Instruments, Columbia, MD) and compared with the areas given by injection of known amounts of glucose and L2 to estimate the molar amounts of products formed.
To determine the activity of the GH55 enzymes produced by cell-free translation (40), 10 l of an individual translation reaction was added to a total reaction volume of 100 l containing 10 mg/ml of either Sigmacell-20, ␤-1,3-glucan, ␤-1,6-glucan, lichenan, laminarin, beechwood xylan, arabinoxylan, D-mannan, or locust bean gum in 50 mM phosphate buffer, pH 6.0. The reactions were carried out at 50°C for up to 24 h. A unit of activity is defined as the release of 1 mol of reducing sugar per minute from laminarin as determined by using the DNS assay (40,41). For calculation of specific activity, the concentration of each translated protein was determined using SDS-PAGE and Gel Dock (Bio-Rad).
Temperature optima were determined by carrying out enzyme reactions from 30 to 90°C in 10°C increments. The reaction was started by mixing 4 l of an individual translation reaction with 10 mg/ml of laminarin in 50 mM phosphate buffer, pH 6.0. After 10 min, reducing end products were determined using the DNS assay (40,41). The pH optima were determined at 50°C using the same enzyme and substrate concentrations as above for the temperature optima studies, but the buffer was varied as follows: 50 mM sodium citrate (pH 4.0 and 5.0); 50 mM phosphate (pH 6.0); Tris-HCl (pH 7.0 and 8.0); 50 mM CHES (pH 9.0 and 10.0); and 50 mM CAPS (pH 11.0). After the temperature optimum was determined, specific activities were estimated by reacting 4 l of translation reaction with 10 mg/ml laminarin in 50 mM phosphate buffer, pH 6.0, for 10 min. The concentration of enzyme in the translation reactions was calculated using SDS-PAGE and Gel Dock (Bio-Rad). Enzymes that did not show activity after 10 min were incubated up to 24 h and then assayed for products.
Structure Determination-All diffraction data were collected at the Life Sciences Collaborative Access Team 21-ID-G Beamline at the Advanced Photon Source of Argonne National Laboratory (Argonne, IL). Diffraction data were processed, integrated, and scaled using HKL2000 (42). The selenomethioninelabeled SacteLam55A structure was initially phased to 1.9 Å using a single wavelength (0.97857 Å) with PHENIX AutoSol (43). Hyss (44) was used to determine the selenium substructure, and the results from Hyss indicated there were eight heavy atoms sites per asymmetric unit (four sites per monomer), which matched the expected number of selenomethionines from the amino acid sequence. The experimentally determined electron density map was of excellent quality and was used as input for PHENIX AutoBuild (45). The SacteLam55A E502A substrate bound structures were solved by molecular replacement with Phaser (46) using the SacteLam55A structure as a starting model. Because of the high resolution, the atomic displacement factors of SacteLam55A and SacteLam55A E502A laminaritriose structures were refined anisotropically. All SacteLam55A structures were completed with alternating rounds of model building in Coot (47) and refinement in PHENIX (48). Surface areas were calculated by PISA (49).
To examine the dynamic motion of several residues surrounding the active site, the glucose-bound and all of the substrate-bound SacteLam55A E502A structures were subjected to an ensemble refinement (50). We excluded the selenomethionine-labeled structure of the unbound enzyme from the ensemble refinement because of the presence of the anomalous signal from incorporated selenium. All model building and refinement steps were monitored using an R free value based on 5.0% of the independent reflections. Model quality was accessed with Molprobity (51) before deposition to the PDB. Structural images were generated with the PyMOL Molecular Graphics System (version 1.5.0.4 Schrödinger, LLC).

RESULTS
Structure of SacteLam55A-The data collection, refinement, and model statistics are presented in Table 1. The structure of selenomethionine-labeled SacteLam55A was solved using single wavelength anomalous dispersion. The initial phased and density-modified electron density map was of excellent quality, and individual residues could be easily identified. The structure contained two SacteLam55A molecules in the asymmetric unit and belonged to the P2 1 2 1 2 1 space group. The structure was refined to a resolution of 1.51 Å, had an R work (R free ) of 11.7% (16.1%), and had interpretable electron density that corresponded to residues 58 -605 and 57-605 for monomers A and B, respectively. Overall, the two monomers were nearly identical and had a C␣ root mean square deviation (RMSD) of 0.163 Å for residues 58 -605. Unless noted, figures and descriptions are of the A monomer.
In addition to SacteLam55A, we determined the structure of SacteLam55A E502A, an inactive form of the enzyme, in complex with glucose, laminaritriose (L3), laminaritetraose (L4) and laminarihexaose (L6) to the resolutions and final R cryst (R free ) values shown in Table 1. All substrate and productbound structures were solved by molecular replacement using SacteLam55A (PDB code 4PEW) as an initial model. Unlike SacteLam55A, the SacteLam55A E502A structures belonged to the P2 1 space group. The L3-and L4-bound structures had one monomer per asymmetric unit, whereas the glucose and L6bound structures had two monomers per asymmetric unit. Similar to SacteLam55A, the two monomers observed in the SacteLam55A E502A structures with either L6 or glucose bound were nearly identical and had C␣ RMSDs of 0.071 and 0.108 Å, respectively. Whereas both monomers of the L6 structure had clear interpretable electron density for the substrate, density corresponding to glucose was found only in the B monomer of the SacteLam55A E502A structure. Regardless of the crystal form or unit cell parameters, all SacteLam55A monomers were equivalent and had an average C␣ RMSD of 0.154 Å.
Overall Structure-SacteLam55A is composed of two righthanded ␤-helical domains connected via an extended 49-residue linker (residues 258 -306) that includes two antiparallel beta sheets (Fig. 1A). The N-terminal (residues 58 -257, blue ribbon) and C-terminal (residues 307-605, red ribbon) domains pack against one another, burying ϳ1909 Å 2 . The torsion angle between the ␤-helical domains is ϳ25°, which is similar to the domain arrangement in PcLam55A (20). A hallmark of the right-handed ␤-helical fold is repetition of a three ␤-strand/turn motif, termed a coil (52). In SacteLam55A, there are seven and ten coils in the N-terminal and C-terminal domains, respectively (Fig. 1A). Although the ␤-strands that comprise the N-terminal and C-terminal domains are similar in SacteLam55A, the length and amino acid composition of the loops are highly divergent. Additionally, the SacteLam55A N-terminal domain is truncated by ϳ100 residues when compared with the C-terminal domain. The smaller size (548 residues in PDB code 4PF0) and structural asymmetry distinguishes SacteLam55A from the larger PcLam55A (752 residues included in PDB code 3EQO), where both domains contain 12 is the intensity of an individual measurement of the reflection, and ϽI(h)Ͼ is the mean intensity of the reflection.
where F obs and F calc are the observed and calculated structure-factor amplitudes, respectively. c R free was calculated as R cryst using ϳ5% of randomly selected unique reflections that were omitted from the structure refinement.
coils and are composed of a similar numbers of residues (20). The length of loops that connect the individual coils of the ␤-helical domains also contribute to the increased size of PcLam55A. These differences are shown in Fig. 1B.
Substrate-binding Cleft-The substrate-binding cleft of SacteLam55A is located at the interface between the N-terminal and C-terminal ␤-helical domains and does not contain any contribution from the linker joining them (Fig. 1A). The L6bound structure revealed that six sugar-binding subsites comprise the SacteLam55A substrate-binding cleft (Fig. 2). We have labeled these subsites from the reducing to nonreducing end (Ϫ1, ϩ1, ϩ2, ϩ3, ϩ4, and ϩ5) using the previously developed formalism for GHs (53), where the hydrolyzed ␤-1,3-glycosidic bond is located between the Ϫ1 and ϩ1 subsites (Fig. 2). Similar positions are occupied in the L3-and L4-bound structures. In SacteLam55A E502A bound to glucose (PDB code 4PEX), the protein contacts to the ligand bound in the Ϫ1 site are identical to the contacts observed in the oligosaccharide-bound structures. Similar contacts are also observed in PcLam55A (PDB code 3EQO) bound to the product analog gluconolactone (20).
The asymmetry caused by the truncation of the N-terminal domain compared with the C-terminal domain and the torsion angle (ϳ25°) between the two domains causes the substratebinding cleft to adopt a curved conformation that follows the external contours of the N-terminal domain. This curvature matches the conformation of ␤-1,3-linked glucans (Fig. 2), which has been noted for GH16 ZgLamA (54) and excludes the linear conformation of ␤-1,4-linked polymers such as cellulose. Unlike GH16, GH17, and GH64 (54 -56), where the substratebinding clefts are open at both ends allowing endo-glucanase activity, the binding cleft of SacteLam55A is capped at one end by Phe-153, Trp-444, and Trp-446 (Fig. 2, purple sticks). This "aromatic block" closes the substrate-binding cleft and so restricts access to one direction along the domain boundary. Similar residues are observed in this position in aligned PcLam55A (20) and are conserved throughout the rest of the GH55 family (Figs. 2 and 3, purple sticks and surface). Thus steric constraints shown by the structures of substrate-bound enzyme and the distribution of end products (described below) show that SacteLam55A releases glucose from a ␤-1,3 linked oligosaccharide, making it an exo-␤-1,3-glucanase (EC 3.2.1.58).
Most of the residues that interact with bound substrates are provided by the C-terminal domain. Additionally, the majority of these interactions are localized to the Ϫ1 subsite, where the terminal glucosyl group of substrate is bound (Fig. 4). In sum, every oxygen atom in the glucosyl group bound in the Ϫ1 subsite except O6 has at least one favorable hydrogen bonding interaction within 3 Å. Moreover, all of the residues that form the Ϫ1 subsite are highly conserved in both bacterial and fungal GH55s, suggesting that a similar binding mode will be present in all. Indeed, the residues from PcLam55A that form part of the active site by contacting the bound product analog gluconolactone (PDB code 3EQO; Ref. 20) are all conserved and occupy overlapping positions with the residues observed to interact with glucose in the SacteLam55A E502A structure.
The structures of SacteLam55A E502A bound to substrates provide additional information on the interactions with the enzyme. Interactions with sugars at the ϩ1 and ϩ2 subsites are limited when compared with the Ϫ1 subsite. For example, the glucosyl group at the ϩ1 subsite is coordinated by Thr-149 and Gln-217 from the N-terminal domain and is in close proximity to Tyr-543 from the C-terminal domain. Although both Thr-149 and Gln-217 are strictly conserved in bacterial GH55 mem- The substrate-binding cleft located at the interface between the N-terminal and C-terminal domains contains at least six sugar binding subsites (Ϫ1, ϩ1, ϩ2, ϩ3, ϩ4, and ϩ5). Residues from the C-terminal domain form extensive interactions with the glucosyl groups bound in the Ϫ1 and ϩ1 subsites. The position of the predicted catalytic acid (Glu-502, yellow sticks) was obtained from alignment of the SacteLam55A structure (PDB code 4PWE) with SacteLam55A E502A bound to L6 (PDB code 4PF0). Phe-153, Trp-444, and Trp-446 (purple sticks) form an "aromatic block" that caps one end of the substrate-binding cleft. Two residues (Phe-143 and Trp-144, orange sticks) occupy space that overlaps with a putative substrate binding cleft proposed for PcLam55A. Trp-196 and Tyr-194 (blue sticks) from the N-terminal domain form stacking interactions with sugars at the ϩ2 and ϩ5 subsite because L6 is wrapped around the N-terminal domain. The solvent-exposed C6 hydroxyl groups of the bound L6 are shown as green sticks. All C6 hydroxyl groups are fully exposed to solvent. bers, Tyr-543 is not as well conserved in the bacterial GH55 members and can be tyrosine, phenylalanine, or asparagine. In addition to interacting with the ϩ1 subsite, Gln-217 hydrogen bonds to the glucose moiety at the ϩ2 subsite. The only other interaction with the ϩ2 subsite is through the formation of a stacking interaction between bound substrate and Trp-196, which is conserved in bacterial GH55s but not in fungal GH55s (Fig. 2).
Two other striking features of the SacteLam55A substratebinding cleft are the relatively few interactions with bound substrate at the ϩ3, ϩ4, and ϩ5 subsites and its solvent accessibility. Unlike the Ϫ1, ϩ1, and ϩ2 subsites, which are partially occluded from solvent, the ϩ3, ϩ4, and ϩ5 subsites are exposed to solvent (Fig. 2). There are no hydrogen bonding or aromatic residues that are close enough to interact with either the ϩ3 or ϩ4 subsites and the only interaction with the ϩ5 subsite comes from a stacking interaction with Tyr-194 (Fig. 2), which is poorly conserved throughout the GH55 family.
SacteLam55A has no pocket adjacent to the Ϫ1 subsite comparable with the suggested ␤-1,6 branched sugar binding pocket in PcLam55 (20). Instead, Phe-143 and Trp-144 of SacteLam55A (Fig. 3, orange spheres) are adjacent to the Ϫ1 subsite and adopt a conformation that fills the space proposed for this putative branched sugar binding site. All C6 hydroxyl groups of bound L6, including the glucose moiety in the Ϫ1 subsite, are fully exposed to solvent and point away from the substrate-binding cleft (Fig. 2, green atoms). Indeed, Gln-150 (orange sticks) is the closest residue to the C6 hydroxyl group the glucose moiety in the Ϫ1 subsite (ϳ3.5 Å; short dashed line in Fig. 4). This accessibility accounts for the reactivity of the enzyme with laminarin, as the infrequent ␤-1,6 branching can be accommodated along the entire length of the substratebinding cleft. Interestingly, Phe-143 and Trp-144 are not well conserved among the bacterial GH55 enzymes (Fig. 3, orange spheres), suggesting that structural variations are possible near the Ϫ1 subsite.
Dynamics of Substrate Binding-A comparison of all SacteLam55A structures revealed that the quality of the electron density for Tyr-194 and Trp-196 depended on the presence and length of bound substrate (Fig. 5, left panels). These differences in electron density suggested that Tyr-194, Trp-196, and other surrounding residues were mobile and might play a role in substrate binding. To investigate the dynamic nature of the aromatic residues in the substrate-binding cleft, the structures of SacteLam55A E502A with glucose, L3, L4, and L6 bound were refined using ensemble refinement (50). Ensemble refinements of crystals structures have been shown to extract protein dynamics from "static" crystal structures (57). In all cases, the SacteLam55A E502A ensemble structures had lower R work (R free ) than the static structures ( Table 2). The loop that contained Tyr-194 and Trp-196, which formed part of the substrate-binding cleft, displayed substrate-dependent flexibility (Fig. 5, right panels) that was not observed elsewhere in SacteLam55A.
These comparisons show that, in the absence of substrate, the side chains of Tyr-194 and Trp-196 lacked well defined electron density consistent with their ability to adopt numerous conformations (Fig. 5A). Furthermore, comparison of the elec-  MAY 8, 2015 • VOLUME 290 • NUMBER 19 tron density of  in the A and B monomers of the glucose-bound structure (where glucose was either absent or present, respectively) showed that the presence of glucose did not improve the electron density around these two residues. Thus the presence of substrate at the ϩ1 subsite was not sufficient to order either  In the L3-bound structure, where the Ϫ1, ϩ1, and ϩ2 subsites were occupied, Trp-196 interacted with substrate at the ϩ2 subsite. The interaction at the ϩ2 subsite helped to stabilize , and this interaction led to a reduction in the number of discrete positions needed to model its electron density (Fig.  5B). Despite the lack of interactions between Tyr-194 and L3, a reduced number of positions were needed to account for this electron density also. A similar pattern was observed in the L4-bound structure (not shown). In the L6-bound structure, which had a fully occupied substrate-binding cleft that included interactions with both Trp-196 and Tyr-194, there was no appreciable change in the distribution of positions observed for Tyr-194 and Trp-196 when compared with the L3-and L4-bound structures (Fig. 5C). These results indicate that occupation of the Ϫ1, ϩ1, and ϩ2 sites is sufficient to orient these distal residues and help position the substrate for catalysis.

Structure and Functional Characterization of SacteLam55A
Active Site Residues-Inverting GHs use acidic and basic residues to activate another nucleophile to cleave glycosidic bonds (58,59). In the following, the position of the predicted catalytic acid (Glu-502, yellow sticks) was obtained from alignment of SacteLam55A (PDB code 4PWE) with SacteLam55A E502A bound to L6 (PDB code 4PF0). There were three carboxylatecontaining residues within 5 Å of the ϩ1 subsite (Asp-449, Glu-480, and Glu-502; Figs. 4 and 6). These residues are strictly conserved throughout the GH55 family, and the corresponding residues adopt a similar conformation in PcLam55A (Ref. 20; see Fig. 6). Although all three of these residues were capable of interacting with a glucosyl group bound in the ϩ1 subsite, only Glu-502 was capable of interacting with the anomeric oxygen (2.3 Å), indicating that it was likely the catalytic acid. Furthermore, although Tyr-505 was close to the anomeric oxygen (3.3 Å), it was better positioned to provide a hydrogen bond (2.7 Å) to O4 of glucose in the Ϫ1 position and to the carboxylate group of Glu-502. Upon considering this, the role of Tyr-505 may be to orient Glu-502 and the anomeric oxygen for better reaction.
Neither Asp-449 nor Glu-480 could achieve a position relative to Glu-502 similar to that observed in other inverting GH families (58 -60). Thus we explored other options for the nucleophile and identified a water (Fig. 6) positioned ϳ3.4 Å  from C1 of the glucosyl moiety in the Ϫ1 subsite as a likely candidate (58,59). The side chains of Gln-174 and Ser-198 (two rotamers shown) and the backbone carbonyl oxygen of Thr-149, which are conserved in both bacterial and fungal GH55, coordinated a highly ordered water molecule in the active site ( Fig. 6 and Ref. 20), whereas Glu-480 (also conserved) provided a stabilizing hydrogen bond to the side chain of Gln-174, further organizing this potential proton relay system for activation of water. A reaction pathway is indicated by arrows in Fig. 6, along with blue sticks showing the positions of the corresponding residues and water in the PcLam55A active site. Catalytic Reaction-Among the polysaccharides tested, pure laminari-oligosaccharides and the natural product laminarin were the only substrates that were hydrolyzed by SacteLam55A. The end products from an exhaustive hydrolysis of pure laminari-oligosaccharides L5, L4, or L3 were a mixture of glucose and laminaribiose (L2) in a ratio consistent with removal of glucose groups until the remaining oligosaccharide was L2, which did not react. Thus L5 gave 3 mol of glucose and 1 mol of L2, L4 gave 2 mol of glucose and 1 mol of L2, whereas L3 gave 1 mol of glucose and 1 mol of L2. SacteLam55A had a maximal specific activity at 50°C and pH 6 and maintained 70% or more of the maximal activity from 35 to 65°C and from pH 6.0 to 9.0. Other kinetic parameters for SacteLam55A that reacted with soluble laminarin are shown in Table 3.
Guided by the structure, we tested active site residues that might participate in catalysis by generation of mutations at nine different residues in SacteLam55A. SacteLam55A with mutations at Glu-480 (E480A and E480Q) or Glu-502 (E502A and E502Q) was inactive. Mutations at Gln-174 (Q174A and Q174N), Asp-449 (D449A and D449N), or Tyr-505 (Y505A) had 10 2 -10 4 -fold reductions in k cat /K m (Table 3), which arose from a combination of changes in both catalytic rate and binding affinity. This likely reflects subtle changes in positioning of the substrate and ability to activate the water nucleophile caused by these mutations. SacteLam55A S198A had k cat /K m of 700 Ϯ 200, which is ϳ13-fold lower than SacteLam55A, while retaining a similar apparent K m value (ϳ0.6 mg/ml versus ϳ0.9 mg/ml), implicating its role in the proposed proton relay system as opposed to substrate binding.
Functional Survey of the GH55 Family-To date, the majority of studies on GH55 have been on fungal enzymes (19,20,(22)(23)(24)(25)(26)(27)61). With the exception of a single report on the cloning and sequence analysis of GluA from Arthrobacter sp. NHB-1 (21), little is known about the more numerous bacterial members of this family. To investigate the natural diversity found within GH55, we generated a phylogenic tree of all nonredundant GH55 sequences. The tree shows that the nonredundant GH55s (177 total, 102 bacterial, and 75 fungal) form two distinct clades (Fig. 7), with truncation of the N-terminal domain accounting for the separation of the bacterial and fungal clades. Additionally, the bacterial clade could be further subdivided into five subclades that are composed of either mostly Actinobacteria (subclades 1, 2, and 4) or Firmicutes, Clostridia, Proteobacter, and others (subclades 3 and 5; Fig. 7 and Table 4). The sequences within bacterial subclades 1-4 had a Blast bit score of Ͼ200 (E value Ͻ10 Ϫ69 ), respectively, indicating that they are closely related. Sequences in the fungal subclade had a Blast bit score of Ͻ50 (E value Ͼ10 Ϫ9 ), indicating greater divergence, whereas bacterial subclade 5, a small out-group, shared little sequence identity with other bacterial GH55 enzymes (bit score Ͻ30, E value Ͼ0.001). Although the active site residues described above were also found in subclade 5, the predicted lengths and positions of loops and gaps needed to generate the alignment accounted for the overall divergence of subclade 5 from SacteLam55A.
To probe the reactivity in the GH55 family, we identified 56 genes for synthesis (32% total coverage of the 177 nonredundant GH55), and 51 of these were successfully synthesized (29% total coverage, 45 bacterial GH55 and 6 fungal GH55). All 51 synthesized genes were successfully expressed as soluble proteins using cell-free translation (Fig. 7 and Table 4). The translated proteins were assayed using a panel of substrates (43), and 33 of the bacterial and 1 fungal GH55 were found to be active (19% functional coverage). Specifically, however, laminarin was the only substrate hydrolyzed by these 34 enzymes from 10  different substrates tested. The specific activities and pH and temperature optima for laminarin hydrolysis were determined for each ( Fig. 7 and Table 4). Overall, the bacterial GH55s identified by this screen displayed a broad range of catalytic rates and pH and temperature optima. Interestingly, no catalytic activity was detected for bacterial subclade 5, the small phylogenetic out-group. Given the divergence in the lengths and positions of loops and gaps described above for subclade 5, further studies will be needed to ascertain whether these proteins are properly classified in GH55 or perhaps have specificities for substrates other than those tested here.

DISCUSSION
Active Site Residues-The structures of SacteLam55A bound to substrates define the residues that make up the active site. They are conserved throughout the GH55 family and also define the origin of exo-glucanase activity observed for SacteLam55A. Although it was previously suggested that some GH55 might have endo-␤-1,3-glucanase activity (21,26,62), sequence conservation in the Ϫ1 and ϩ1 sugar binding subsites and in the "aromatic block" support that most, if not all, GH55s are exo-␤-1,3-glucanases. This conclusion is in accord with biochemical studies carried out on PcLam55A (20).
Mechanism-Inverting GHs use acidic and basic residues that are separated by ϳ10 Å to activate water to serve as the nucleophile in the hydrolysis reaction (58,60,63,64). The role of the catalytic acid is to protonate the anomeric oxygen and thus stabilize the leaving hydroxyl group, whereas the catalytic base activates water for nucleophilic attack on the anomeric carbon (Fig. 6). The substrate-bound SacteLam55A structures show that Glu-502 is best positioned to interact with the anomeric oxygen at the Ϫ1 subsite as the general acid. The conservation of this residue throughout the entire GH55 family and the loss of all activity in SacteLam55A bearing the E502A and E502Q mutations support this functional assignment.
Neither Asp-449 nor Glu-480 (two other conserved carboxylates) adopt positions relative to Glu-502 that are similar to the active sites in other inverting glucanases (58,60). Furthermore, it is unlikely that Asp-449 can function as a base to activate water because it is over 6 Å from the anomeric oxygen, has no water molecules in close proximity, and provides stabilizing interactions with O4 of the glucosyl group bound in the Ϫ1 subsite (Fig. 6) and with Trp-446, which is part of the "aromatic block." Mutation of Asp-449 to either Asn or Ala decreased k cat /K m by ϳ10 2 and ϳ10 4 , respectively, with a larger proportional decrease in k cat than K m . Therefore we propose Asp-449 has a primary role in positioning the substrate with consequences on the catalytic rate.
Glu-480 is positioned on the same side of the active site as Glu-502, and SacteLam55A mutants that lacked Glu-480 (E480A and E480Q) were completely inactive; thus Glu-480 has an important role in catalysis, and one possibility is to function as a general base in the catalytic mechanism by interaction with Gln-174, Ser-198, and the active site water proposed to be the nucleophile. Another possible function is to assist in placement of the substrate. Although Glu-480 is too distant to interact with the proposed water nucleophile directly, it does interact with both O2 and O3 and so assists in precise positioning of substrate in the Ϫ1 subsite. Glu-480 also participates in the proton relay network including the side chains of Gln-174 and Ser-198 and the carbonyl group of Thr-14. SacteLam55A S198A had ϳ8% residual activity, indicating that Ser-198 plays an important, but not essential, role in catalysis. Moreover, the Q174A mutation gave more severe defects in catalysis than S198A but nevertheless retained measurable activity.
Proton relay networks have been proposed for the inverting GH6 and GH97 families (63,65). For example, the catalytic base (Asp-226) of exocellulase Cel6B from Thermobifida fusca activates water indirectly through a second water that is coordinated by a highly conserved Ser residue (63,64,66). Glu-480 of SacteLam55A appears to play a similar role as Asp-226 of Cel6B in the indirect activation of water. Additionally, in the mechanism proposed for SusB from Bacteroides thetaiotaomicron, a GH97 ␣-glucosidase, two Glu residues (Glu-439 and Glu-508) work together to activate the water nucleophile (65).
Protein Dynamics in the Substrate Binding Cleft-Comparisons of SacteLam55A structures showed that the quality of electron density for Tyr-194 and Trp-196 was dependent on the presence of substrate at the ϩ2 subsite (Fig. 5). The interactions of these residues observed in the L3, L4, and L6 structures likely help retain the oligosaccharide for bond cleavage. After bond FIGURE 7. Phylogenetic tree of the GH55 family. GH55 was separated into five bacterial subclades (subclades 1-5 numbered around circle; light brown or no background) and a fungal subclade (green background). Sequences selected for gene synthesis, cell-free expression, and enzyme assays are indicated by red nodes in the tree. Specific activity at pH 6 and 50°C and the pH optimum and the highest temperature where Ͼ70% of maximum activity was observed for each enzyme in a 2-h assay are mapped as concentric rings moving outward from the center of the tree. Published values from fungal GH55 are also included. The positions in the tree of P. chrysosporium Lam55A and SacteLam55a, for which structures are available, are marked with red stars.
cleavage, glucose can be released through the open access to solvent that is adjacent to the Ϫ1 subsite, leaving it available to rebind substrate.
Aromatic residues contribute to the processivity of glycosyl hydrolases by providing dynamic interactions with a bound polysaccharide chain. GH55 lacks a tunnel as in GH6 (67) and GH7 (68,69) and also lacks the extensive track of aromatic residues seen in processive GH18 chitinases (70). Although speculative, the dynamic motion of Tyr-194 and Trp-196 suggested by the ensemble refinements could allow the remaining bound oligosaccharide to processively slide along the substratebinding cleft to place a new sugar into the Ϫ1 subsite without being released from the active site cleft. If this were so, the flexibility of Tyr-194 and Trp-196 could play an important role in the GH55 reaction beyond simple sequestration of L3 and longer laminarioligosaccharides.
Secondary Binding Site-During the final stages of refining the L3-, L4-, and L6-bound SacteLam55A structures, clear and interpretable electron density corresponding to a bound oligosaccharide was identified on the N-terminal domain (Fig. 8). An   Fig. 7, starting from the initial position marked in the subclade with • and moving counterclockwise. The sequence name, estimated protein concentration in the cell-free translation reaction (g/ml), pH and temperature optima, and specific activity at these optimal values are shown. Assays contained 10 l of cell-free translation reaction. A unit of enzyme activity is defined as the release of 1 mol reduced sugar per minute from laminarin at the stated optimal pH and temperature. The optima and effective range for temperature and pH, defined as retention of at least 70% of maximum activity in a standard assay), are also shown. b ND indicates no enzyme activity was detected above the level of 0.001 mol of reducing sugar produced after 24 h of reaction. c Activity observed above the level of 0.001 mol of reducing sugar produced after 24 h. d Cloning and sequence analysis of GluA from Arthrobacter sp. NHB-1 have been previously reported (21). e Biochemical and structural studies of PcLam55A purified from the natural host were reported elsewhere (20).
additional L3, glucose, and L5 molecule was modeled in the L3-, L4-, and L6-bound SacteLam55A structures, respectively. There was residual electron density at the secondary binding site (SBS) of the L6 and L4 structures that suggested there were additional bound sugar molecules, but these molecules were not modeled because of either disordered or low occupancy. Unlike the substrate-binding cleft, the SBS is located on the N-terminal domain adjacent to the substrate-binding cleft and is solvent-exposed (Fig. 8). The proximity of the two oligosaccharide-binding sites may allow a sufficiently long laminarioligosaccharide to bridge the two sites (Fig. 8A, dashed white line).
Interactions with the oligosaccharide at the secondary site are not as extensive as observed between enzyme and substrate in the substrate-binding cleft. Several residues, including Asn-112, Gln-114, Asp-140, Arg-196, and Asp-192, form direct hydrogen bonds with the bound L5 molecule. (Fig. 8B). BcX, a single domain xylanase from Bacillus circulans, contains a SBS that has a similar amino acid composition and proximity to the active site as the SBS of SacteLam55A (71). Ludwiczek et al. (71) showed that the SBS of BcX binds both soluble and insoluble xylan substrates and may play a role in localization. Additional GH families have been shown to contain SBS, and it has been proposed that these auxiliary binding sites affect substrate targeting, processivity, and activity (72). It is possible that the SBS of SacteLam55A plays a role in localizing SacteLam55A to the surface of insoluble laminarins in a manner similar to a carbohydrate-binding module (73).
The residues that compose the secondary binding site are conserved in subclade 1 (primarily Streptomyces), of which SacteLam55A is a member. Otherwise, the residues that compose the secondary binding site are not conserved within the GH55 family. For example, PcLam55A has a helix that overlays in space with part of the subclade 1 secondary binding site. Interestingly, however, PcLam55a Tyr-135, which is located at the start of this helix, was suggested to play a role in substrate binding (20).
Other Members of GH55-We used biochemical assays to provide a broader functional annotation of the GH55 family ( Fig. 7 and Table 4). Laminarin was the only substrate that gave a positive assay response from the enzymes tested. The enzyme from Streptomyces hygroscopicus subsp. jinggangensis TL01 (GenBank TM locus AEY93509) had the highest specific activity of all enzymes tested for reaction with soluble laminarin at pH 6 and 50°C.
Subclade 1 and 2 enzymes, primarily from Actinomycetales, had similar ranges of optima for pH, temperature. The clade 3 enzymes, from a broader distribution of bacterial genera and also showed more diversity in specific activity and pH optimum, and their temperature optima were generally lower than observed for the subclade 1 enzymes. In subclade 4, also primarily from Actinomycetales, the enzymes from soil organisms Arthrobacter phenanthrenivorans Sphe3 (ADX71276) (74), Deinococcus peraridilitoris DSM19664 (AFZ68118), and Micromonospora sp. L5 (ADU06434) (75) had the highest temperature optima and largest range of pH stabilities.
Six fungal GH55s were also expressed and assayed, and only PcLam55A showed activity with laminarin. Because there are eight disulfide bonds in PcLam55A (20), incorrect formation of disulfide bonds during cell-free translation may have contributed to the low number of active fungal GH55 detected.
Conclusion-This study of SacteLam55A provides an in-depth characterization of the structure and function of the GH55 family. By use of mutagenesis and kinetic analyses, it provides new insights into the catalytic residues and also the residues that form an extended substrate-binding cleft. The combination of gene synthesis, cell-free translation and assays using a diagnostic panel of substrates across the entire GH55 represents, to our knowledge, the most complete functional mapping of an entire GH family available to date. Biochemical assays show that laminarioligosaccharides longer than two glucose units long react exclusively by an exo-glucanase reaction, and crystal structures of the enzyme substrate complexes have provided a molecular level understanding of its origin.