Structural and mechanistic analysis of a β-glycoside phosphorylase identified by screening a metagenomic library

Glycoside phosphorylases have considerable potential as catalysts for the assembly of useful glycans for products ranging from functional foods and prebiotics to novel materials. However, the substrate diversity of currently identified phosphorylases is relatively small, limiting their practical applications. To address this limitation, we developed a high-throughput screening approach using the activated substrate 2,4-dinitrophenyl β-d-glucoside (DNPGlc) and inorganic phosphate for identifying glycoside phosphorylase activity and used it to screen a large insert metagenomic library. The initial screen, based on release of 2,4-dinitrophenyl from DNPGlc in the presence of phosphate, identified the gene bglP, encoding a retaining β-glycoside phosphorylase from the CAZy GH3 family. Kinetic and mechanistic analysis of the gene product, BglP, confirmed a double displacement ping-pong mechanism involving a covalent glycosyl–enzyme intermediate. X-ray crystallographic analysis provided insights into the phosphate-binding mode and identified a key glutamine residue in the active site important for substrate recognition. Substituting this glutamine for a serine swapped the substrate specificity from glucoside to N-acetylglucosaminide. In summary, we present a high-throughput screening approach for identifying β-glycoside phosphorylases, which was robust, simple to implement, and useful in identifying active clones within a metagenomics library. Implementation of this screen enabled discovery of a new glycoside phosphorylase class and has paved the way to devising simple ways in which enzyme specificity can be encoded and swapped, which has implications for biotechnological applications.

lyzed by most GPs is close to 1; thus the equilibrium position can be tipped in favor of glycoside synthesis by manipulation of reaction conditions (11)(12)(13).
Since the discovery of the first glycogen phosphorylase (GP) in 1938 (14), only 29 distinct new GP activities have been identified; the majority were in the past 15 years (15). However, given the vast number of glycoside hydrolases (GHs) known, it seems likely that 30 is a significant underestimate of the actual number of GP activities present in nature, especially because the use of phosphorolysis to metabolize glycans is inherently more energetically favorable for a cell than hydrolysis because the released sugar 1-phosphates feed directly into the glycolysis pathway without the need for expenditure of ATP (16,17). Given this metabolic efficiency advantage, and considering the wide range of diverse glycans that are metabolized by microbes, it is probable that glycoside phosphorylases are more widespread than previously thought. Indeed, it seems likely that numerous forms of carbohydrates are metabolized through phosphorolysis, suggesting an abundance of GPs to discover in nature.
Functional metagenomics screening offers a means to search for novel biocatalysts from genetic material drawn directly from the environment (18). Metagenomic libraries provide access to the vast reservoir of uncultivated genetic diversity encoded in microbial communities inhabiting natural and engineered ecosystems (19 -21). Emerging sophisticated sequenceand function-based screening technologies are being deployed to identify novel enzymes, including CAZymes, within these libraries (21)(22)(23)(24). However, to our knowledge no functionbased metagenomics screening approach has yet been developed that targets GP-encoded genes. A central challenge in designing the needed screen is that of distinguishing phosphorylase activity from the related hydrolase activity in a highthroughput manner (Fig. 1A).
We report here the development and implementation of such a high-throughput screening approach to identify GP activity. By screening of a library derived from a passive mine tailings biochemical reactor system (BCR) fed with lignocellulosic biomass (25), we identified a previously unknown ␤retaining GP from CAZy family GH3, and we report on its structural and mechanistic characterization. Our screening approach is based on the use of activated aryl glycosides as easily monitored substrates for glycoside phosphorylases when deployed in the presence of phosphate. The BCR library is a large insert (fosmid) library with an average insert size of 40 kilobases first described by Mewis et al. (25). The authors used the BCR library to search for novel cellulose-degrading enzymes. Therefore, in testing the parallel screening approach to identify GPs, we focused in this case on identifying fosmid clones encoding enzymes with the capacity of phosphorolyzing the ␤1,4-glucosidic linkages of cellulose and cello-oligosaccharides.

Development and testing of the metagenomic screen
The design of the screen was based upon the notion that, because many GPs are members of CAZy GH families, they might accept an activated aryl glycoside substrate, transferring the glycosyl moiety to added phosphate. This concept has not been tested previously, to our knowledge, apart from one demonstration with a nucleotide phosphorylase (26), although the concept has parallels with previous work on nucleotide-dependent glycosyltransferases (27). 2,4-Dinitrophenyl glucoside (DNPGlc) was used to test this idea because it is highly activated (pK a of 2,4-dinitrophenol ϭ 4.0) (28) and has proved to be a near-universal substrate for glucosidases. Furthermore, the low pK a value of the phenol allows direct, continuous assays to be performed at pH values down to below 4 without the need to add base in a stopped manner. The choice of a glucoside was based on the cellulolytic origin of the library to be screened, which thus might contain cello-oligosaccharide-degrading phosphorylases, especially because a number of cellulases and ␤-glucosidases had already been identified within this library (25).
As shown in Fig. 2A all three enzymes cleave DNPGlc in the presence of phosphate, as can be observed by monitoring the increase in absorbance at 400 nm. Importantly, all three enzymes are only minimally active in the absence of phosphate. Confirmation that the rate stimulation is due to phosphorylase action was provided by TLC analysis of reaction mixtures, which revealed that all three catalyzed the transfer of the glucosyl moiety of DNPGlc to inorganic phosphate thereby forming Glc-1-P (Fig. 2B). These findings therefore validate the concept of the screen. 1) Lysed extracts from metagenomic clones are initially assayed using DNPGlc in the presence of phosphate to detect both GHs and GPs. 2) Those clones that show activity are then rescreened in the absence of phosphate to weed out the glycosidases. 3) It is also possible that phosphate could stimulate GH activity through a specific binding effect of some sort, or simply activation at a higher ionic strength, leading to false positives. 4) TLC analysis of this much smaller number of reaction mixtures is used to confirm the presence of the sugar phosphate product. Fosmid clones capable of producing Glc-1-P are sequenced and analyzed to identify the open reading frames (ORFs) responsible for the observed GP activity. An overview of the screening process is provided in Fig. 3.

Functional screen
The BCR fosmid library was constructed using Escherichia coli EPI300 as an expression host and contained 18,048 clones in 47 ϫ 384-well plates (25). Following replication of the BCR master library, a total of 880 clones failed to grow, leaving 17,168 clones to be screened. Functional screening using DNPGlc in the presence of phosphate yielded 54 active clones with activity greater than the mean ϩ 4 S.D., a hit rate of 0.31% (Fig. 4B). To distinguish clones with GP activity from those that are GHs, the 54 hits were re-arrayed into a 96-well master plate and rescreened both in the presence and absence of phosphate (Fig. 4C). Each assay condition was run in triplicate, and positive phosphate dependence was determined by performing a t test between a clone's A 400 values in 0 and 50 mM phosphate. A confidence level of 95% (p value Ͻ 0.05) was set as a threshold to determine whether to further validate the clone. Of the 54 clones from the master library, 12 displayed activity in the presence of phosphate exceeding the set threshold. The reaction products of these 12 clones were then examined by TLC analysis (Fig. 4D). Of the 12 clones so analyzed, two were found to produce Glc-1-P in the presence of phosphate (29K06 and 31P01).

Identification of the active glycoside phosphorylases
Complete sequencing of the two fosmid clones revealed they were contiguous with one another, overlapping with 23,247 bp at 100% identity (Fig. S1A). Thirty one open reading frames (ORFs) were predicted from the 29K06 clone and 32 from 31P01. BLASTX query of these against the CAZy database revealed a novel GH3 ORF (hereafter referred to as bglP) located within the overlapping region of the two fosmids. Based on amino acid sequence analysis (Fig. 5), bglP was predicted to encode a ␤-glucosidase/N-acetylglucosaminidase (NagZ) from a hexosaminidase sub-group of GH3 characterized by the Figure 1. A, generalized phosphorolysis (upper pathway) and hydrolysis (lower pathway) of ␤1-4 linked glycans. R ϭ H or glucose n (n ϭ number of glucose residues). GPs are distinguished from GHs by their use of a phosphate molecule to cleave the glycosidic linkage thereby producing a free sugar 1-phosphate. B, phosphorolysis of cellobiose or cello-oligosaccharides performed by the inverting phosphorylases RtCBP (R ϭ H) and RtCDP (R ϭ glucose n ). C, proposed mechanistic scheme for a retaining ␤-glycosidase/phosphorylase. BglP and Nag3 both employ a double-displacement ␤-retaining mechanism involving a glycosyl-enzyme intermediate and act as preferential phosphorylases (k 3P Ͼ k 3W ). Both enzymes possess the same active-site residues that act as the catalytic nucleophile (Asp) and the acid/base catalytic dyad (Asp and His).

Figure 2.
Phosphate-dependent cleavage of DNPG by RtCBP, RtCDP, and Nag3. A, RtCBP, RtCDP, or Nag3 was incubated with 2 mM DNPGlc in 0 and 50 mM phosphate for 1 h at 37°C, and then absorbance was measured at 400 nm. Error bars indicate standard deviation (n ϭ 3). B, RtCBP, RtCDP, or Nag3 was incubated with 20 mM DNPGlc in 0 and 50 mM phosphate and incubated for 2 h at 37°C, and then samples were spotted on TLC. The glycerol visible on the TLC plate is from the enzyme storage buffer. G1P, Glc-1-P. GH3 ␤-GP identified from a metagenomic library sequence motif (33) KH(FI)PG(HL)GX 4 D(ST)H. However, the motif found in bglP had substitutions at two residues: KH(FI)PGDGX 4 DQH (underlined residues correspond to Asp-193 and Gln-200 in Nag3 and boldface residues correspond to the His-Asp catalytic dyad). This modified motif is also present in the only other known glycoside phosphorylase in the GH3 family, Nag3 (31,32). To confirm that the activity detected was indeed associated with this gene, it was subcloned into a pET expression vector and heterologously expressed and purified. Indeed, activity assays of purified enzyme confirmed that BglP catalyzes the phosphate-dependent cleavage of DNPGlc seen from the 29K06 and 31P01 source clones (Fig. S1B). Whole library, metagenomic library in 384-well plate format was screened in the presence of DNPGlc and phosphate. Clones that gave an A 400 value greater than mean ϩ 4 S.D. were consolidated and re-arrayed into a master library. Consolidation, the Master library was screened with DNPGlc in the presence (ϫ3) and absence (ϫ3) of phosphate. Analysis, a Student's t test was performed between the averages of the A 400 values of each clone in the absence and presence of phosphate. Clones possessing a significantly (p Ͻ 0.05) higher activity with phosphate present were analyzed for the production of glucose 1-phosphate using TLC.

GH3 ␤-GP identified from a metagenomic library Kinetic and mechanistic characterization of BglP
The other known GP in the GH3 family, C. fimi Nag3, has been subjected to mechanistic characterization to probe its proposed two-step double-displacement mechanism and has been confirmed to produce ␤Glc-1-P and ␤GlcNAc1P, although no 3D structure has been obtained. In fact, Nag3 was the first retaining ␤-glycoside phosphorylase described and, like BglP, was shown to also catalyze hydrolysis, but at a reduced rate relative to phosphorolysis (32). Comparison of their amino acid sequences (Fig. 5) reveals a high degree of sequence similarity between BglP and Nag3 (55% similarity and 39% identity), making it probable that the two enzymes follow the same mechanism involving a covalent ␣-glucosyl enzyme intermediate (Fig. 1C). To confirm this and to allow comparison of the two enzymes, a kinetic and mechanistic analysis of BglP was performed.
Kinetic parameters were determined for cleavage of three different substrates, para-nitrophenyl ␤-glucoside (pNPGlc), 2,4-dinitrophenyl ␤-glucoside (DNPGlc), and para-nitrophenyl ␤-N-acetylglucosaminide (pNPGlcNAc), both in the  (Table 1), it is apparent that BglP prefers glucoside substrates over N-acetylglucosaminides, k cat /K m values for pNPGlc being ϳ10-fold higher and k cat values being ϳ100-fold higher than those of the hexosaminide. A second distinction of the two substrate classes is that k cat for the two glucoside substrates increases over 10-fold as phosphate concentrations are raised, whereas increases for pNPGlcNAc are much more modest at 1.5-2-fold. K m values also increase with phosphate, by up to 20 -30-fold for the glucosides, with the net consequence being that k cat /K m values remain approximately constant. These observations for the glucosides reflect classic kinetic behavior for a ping-pong mechanism with substrates for which the second step (cleavage of the glycosyl enzyme) is rate-limiting ( Fig.  1C) (34). Introduction of an alternative, better, nucleophile (phosphate) into the reaction accelerates the decomposition of the glycosyl enzyme intermediate through provision of a second pathway (k 3P ). The net effect is to not only raise k cat values but also to decrease the accumulation of the glycosyl-enzyme intermediate, thereby raising the K m value. By contrast the k cat /K m value, which can be expressed in terms shown in Equation 1, reflects the first irreversible step and is thus unaffected by steps occurring later in the pathway.
These kinetic studies therefore strongly support the presumed ping-pong mechanism for BglP. They also suggest that, for the glucosides tested and certainly for DNPGlc, the ratelimiting step is the turnover of the glucosyl-enzyme intermediate. However, the smaller effect of phosphate on k cat and K m values for pNPGlcNAc might suggest that the formation of the glycosyl-enzyme intermediate remains at least partially ratelimiting for this substrate. This implies that the presence of the acetamide moiety lowers reaction rates, possibly due to different interactions with the signature loop. If this is the case then measuring the pre-steady-state phase of BglP-catalyzed hydrolysis of the glucoside and N-acetylglucosaminide substrates should reveal substantial differences between the substrates. The use of 6,8-difluoromethylumbelliferyl ␤-glucoside (DFMUGlc) and 6,8-difluoromethylumbelliferyl ␤-N-acetylglucosaminide (DFMUGlcNAc) improved signal sensitivity and substrate reactivity (phenol pK a ϭ 4.7) (35) over their pNP counterparts. Indeed, a clear burst phase was observed for cleavage of DFMUGlc, but not for DFMUGlcNAc, supporting the above assignment of rate-determining steps (Fig. 6D). Direct evidence for a two-step covalent mechanism was then sought by using electrospray ionization mass spectrometry to The conserved GH3 hexosaminidase sub-group sequence motif is indicated (*, bold letters), and the catalytic nucleophile residues are denoted (F). Black shading indicates highly conserved residues, and gray shading indicates conserved similar residues. Sequence alignment was generated using T-Coffee (http://tcoffee.crg.cat/apps/tcoffee/do:regular), and shading was done in BoxShade version 3.21 (http:// www.ch.embnet.org/software/BOX_form.html) by K. Hofmann and M. Baron.

GH3 ␤-GP identified from a metagenomic library
detect the covalent glycosyl-enzyme intermediate. As seen in Fig. 7A, the BglP mass increased by 163 Da, equivalent to a glucosyl moiety, upon incubation with DNPGlc consistent with formation of the covalent intermediate. Notably, the presence of phosphate, which induces rapid turnover of the intermediate, resulted in disappearance of the corresponding glycosylenzyme mass.
Having confirmed a two-step mechanism involving a glycosyl-enzyme intermediate, we sought to lengthen the lifetime of the intermediate to allow further mechanistic and structural studies. This was achieved by use of 2,4-dinitrophenyl 2-deoxy-2-fluoro-␤-D-glucopyranoside (DNP2FGlc) as a slow substrate for which the deglycosylation step (k 3 ) is much slower than the glycosylation step (k 2 ). This arises as a consequence of inductive destabilization of the oxocarbenium ion-like transition states by the C2 fluorine, which slows both steps (k 2 and k 3 ), whereas the incorporation of an excellent (DNP) leaving group ensures that k 2 Ͼ k 3 and thus that the intermediate accumulates (36). Indeed, incubation of BglP with DNP2FGlc in the absence of phosphate resulted in time-dependent pseudo-firstorder inactivation of the enzyme, as shown in Fig. 7B. The rate of inactivation varied, in a saturable manner, with the concentration of the 2-fluoro-sugar, allowing extraction of kinetic parameters for inactivation of BglP of k i ϭ 0.17 min Ϫ1 and K i ϭ 32 mM (Fig. S2A). The 2-fluoro-glucosyl-BglP (2FGlc-BglP) covalent intermediate species was shown to be mechanistically relevant by removal of excess inactivator and then measuring rates of reactivation by assaying aliquots of the enzyme as a function of time in buffer containing increasing concentrations of phosphate. As can be seen in Fig. 7C, phosphate did indeed stimulate reactivation of the enzyme in a time-dependent fashion. The plot of k P versus [phosphate] showed no saturation behavior, revealing that binding of inorganic phosphate is weak. However, the slope of the line yielded a value for the secondorder rate constant for reactivation of k P /K P ϭ 3.2 ϫ 10 Ϫ5 min Ϫ1 mM Ϫ1 (Fig. S2B). This absence of saturation at phosphate concentrations of 100 mM might seem to be a concern, given the apparent saturation binding behavior of phosphate seen in Fig. 6. However, it is clear from the difference in maximal rates observed between DNPGlc and pNPGlc in Fig. 6, A and B, that the curvature seen in these plots has its origin in changes in rate-limiting step as phosphate concentrations increase: the glycosylation rate constant (k 2 ) for DNPGlc is greater than that for pNPGlc.

X-ray crystal structure of BglP
Crystals of unliganded BglP were obtained through hanging drop vapor diffusion in a mother liquor containing 5 mg/ml BglP and 27% polyethylene glycol 1000, and the structure was solved to a resolution of 2.1 Å ( Table 2). BglP is a 567-aa protein that adopts a two-domain fold (Fig. 8A) and is monomeric in solution according to gel filtration (data not shown). The N-terminal domain (aa 1-392) forms a (␤/␣) 8 TIM barrel structure typical of many glycoside hydrolase catalytic domains, with an active site composed of residues within loops that radiate out from the C-terminal ends of the ␤-strands of the barrel. The C-terminal domain (aa 393-562) adopts an ␣/␤ sandwich that does not participate directly in catalysis, with the exception of amino acids 476 -487, which pack along the edge of the TIM barrel active site to stabilize a loop that emanates from ␤-strand 5 of the barrel and bears the sequence KH(FI)PGDGX 4 DQH.
Error bars indicate standard error of the mean; where the bars are not present, the error is smaller than the symbols used. Errors were derived from the fit to the experimental data provided by GraphPad. Kinetic parameters for cleavage of the aryl glycoside substrates at each concentration of phosphate can be found in Table 1. D, pre-steady-state burst phase analysis of the cleavage of DFMUGlc (red) and DFMUGlcNAc (gray). DFMUGlc (10 M) or DFMUGlcNAc (10 M) was incubated in a fluorimeter pre-chilled to 12°C. DFMU release was measured as a function of time after the addition of 42 nM (for DFMUGlc) or 420 nM (for DFMUGlcNAc) BglP at ϳ1 min. Fluorescent signal was converted to DFMU concentration using a standard curve of free DFMU at pH 7.0.

Table 1
Kinetic parameters for the reaction of BglP with pNPGlcNAc, pNPGlc, and DNPGlc Reactions were carried out in storage buffer B and the indicated concentration of potassium phosphate, pH 7.0, at 25°C. Molar extinction coefficients at 400 nm are as follows: pNP ϭ 7280 M Ϫ1 cm Ϫ1 and DNP ϭ 12,460 M Ϫ1 cm Ϫ1 . Parameters were calculated using the following: V 0 ϭ ͓E͔⅐k cat ⅐͓S͔/(K m ϩ ͓S͔). Standard error was calculated from three replicates. ND means not determined.

GH3 ␤-GP identified from a metagenomic library
This loop contains the sequence that defines the GH3 subgroup to which BglP belongs (albeit with the two alterations noted previously) and accommodates the general acid/base His-206 (Fig. 8B).

Substrate recognition
To gain structural insights into the mechanism and specificity of BglP, crystals of the enzyme were soaked with DNP2FGlc to form the long-lived glycosyl-enzyme intermediate that had been characterized kinetically. The structure so determined revealed clear electron density for a 2-deoxy-2-fluoro-␣-D-glucosyl moiety in a 4 C 1 chair conformation covalently bound via C1 to Asp-288 (Fig. 8B). This residue is conserved in GH3 enzymes and is indeed suitably positioned within the BglP active site to serve as the catalytic nucleophile forming the key reaction intermediate. The active site contains most of the same residues identified in other GH3 enzymes (37) as being involved in substrate recognition, such as Lys-193 and His-194 of the signature sequence. Also consistent with other members of the GH3 sub-family to which BglP belongs, the imidazole of a histidine residue (His-206) is positioned ϳ5.5 Å from the carboxylate of Asp-288 on the opposite face of the hexose ring of the bound intermediate and is thus well-positioned to assist in catalysis by acting as a general acid/base. It first protonates the oxygen of the scissile bond to assist leaving group departure, followed, in this case, by activation of an incoming phosphate that reacts with the covalent glycosyl-enzyme intermediate to generate a phosphorylated product with net retained anomeric stereochemistry (Fig. 8B). The ϳ5.5 Å distance between Asp-288 and His-206 of BglP is consistent with glycosidases that employ a configuration-retaining mechanism (38). One key difference, however, is the identity of the residue interacting with the C2-position of the sugar, this being the amino acid flanked by His-206 and Asp-204 (the so-called His-Asp catalytic dyad) in the signature sequence. In the GH3 GlcNAc-cleaving enzymes, such as the NagZs, the residue at that position is a serine, although in BglP it is a glutamine, as seen in the structural overlay of Fig. 8C wherein the 2FGlc-BglP structure is superimposed on the structures of two other GH3 ␤-N-acetylglucosaminidases (NagZ from Burkholderia cenocepacia in complex with GlcNAc (PDB CODE 4GNV) and NagZ from 3NVD)). The overlay reveals that Gln-205 of BglP is likely to clash with the NAc of GlcNAc-based substrates, as the amide of the Gln side chain sits ϳ2 Å from the methyl group of the NAc from GlcNAc and PUGNAc. In the natural glucosyl intermediate, where a hydroxyl group is present at C2, we predict the Gln-205 amide would be 3.0 -3.5 Å away from the hydroxyl group, suggesting that Gln-205 could form a hydrogen

GH3 ␤-GP identified from a metagenomic library
bond with the C2 hydroxyl group of glucoside substrates. In both cases, the interaction at that position is between an amide and a hydroxyl, but the directionality of the interaction is inverted, a beautiful example of specificity swapping. Indeed, it had previously been speculated that the broadened substrate specificity seen for Nag3, which also prefers glucoside to N-acetyl-hexosaminide substrates, was due to the substitutions seen in the modified sequence motif (31): KH(FI)PGDGX 4 DQH (underlined residues correspond to Asp-193 and Gln-200 in Nag3). The other differing residue, Asp-198, is not close to the active site thus is unlikely to be involved in the broadened substrate specificity of the enzyme.
To test whether Gln-205 indeed plays a role in discriminating between glucoside and GlcNAc substrates, it was replaced by serine in BglP using site-directed mutagenesis to form the mutant BglP-Q205S. Determination of the kinetic parameters for the cleavage of pNPGlc, pNPGlcNAc, and DNPGlc by BglP-Q205S (Fig. 9, A-C) revealed that the Gln to Ser mutation caused the preferred substrate to switch from pNPGlc to pNPGlcNAc as shown graphically in Fig. 9D. Kinetic parameters determined both in the absence and presence of 50 mM phosphate are shown in Table 3. The k cat and k cat /K m values of the Q205S mutant for pNPGlcNAc are both ϳ10-fold higher than those of the wildtype, with no obvious stimulation from phosphate. However, TLC analysis confirms that GlcNAc1P is the predominant product in the presence of phosphate, and thus the rate-limiting step for the mutant with the GlcNAc substrate remains the glycosylation step (Fig. S3). The opposite situation is seen for pNPGlc, for which only k cat /K m values could be obtained because saturation was never reached, even at 50 mM substrate. The k cat /K m of the Q205S mutant is ϳ10-fold lower than that of wildtype, and again no significant effect of phosphate on rates was observed, pointing to rate-limiting formation of the covalent intermediate. For DNPGlc, a clear stimulation of activity by phosphate could be seen in the mutant, and once again k cat (0 mM phosphate) and k cat /K m (50 mM phosphate) are ϳ10-fold lower than their counterparts for the wildtype enzyme. The presence of the better leaving group (i.e. DNP) accelerates the glycosylation step sufficiently that deglycosylation (k 3 ) becomes rate-limiting, and thus stimulation by phosphate is observed.

Phosphate recognition
Hoping to gain experimental insights into the structure of the product complex, we tried to obtain a crystal structure of BglP bound to 2-deoxy-2-fluoro-␤-glucose-1-phosphate. Soaking BglP crystals with this analog (times ranging from 5 min to 1 h) either generated the same 2-fluoroglucosyl-enzyme intermediate species described above, or nothing was found bound in the active site. We therefore resorted to manual docking studies to predict how BglP accommodates a phosphoryl group in its active site. The product ␤Glc-1-P was modeled into the active site by superimposing the hexose ring of Glc onto the experimentally determined hexose ring of the covalently bound 2-fluoroglucosyl moiety (Fig. 10). When modeled in the energetically favored 4 C 1 chair conformation, the phosphate of ␤Glc-1-P sterically clashed with the side chain of Met-292. This was not too surprising, however, considering that retaining glu-

GH3 ␤-GP identified from a metagenomic library
cosidases are known to assist bond cleavage by distorting the substrate to position the leaving group in a pseudoaxial position, which brings the conformation of the substrate closer to that of the reaction transition state (39). Thus, the repulsive interactions between Met-292 and the phosphate of ␤Glc-1-P, in conjunction with the favorable binding interactions between the substrate glucosyl and phosphate moieties, likely force the phosphate of the bound ␤-Glc-1-phosphate product into a pseudoaxial position. This is also likely to be the case for the bound oligosaccharide substrate. Interestingly, when this distortion is accounted for in our model by distorting the Glc ring toward a 1 S 3 skew-boat conformation (based on a crystal structure of a GH3 NagZ from B. subtilis (40)), which places the phosphate group pseudoaxial, the clash with Met-292 is relieved with no other steric interferences arising. The pseudo-axial position also places the oxygen atom of the phosphoester linkage of the product within ϳ3.2 Å of the imidazole of His-206 (Fig. 10), which is predicted to be where the oxygen of the glycosidic bond of the substrate would reside when it is protonated during bond cleavage. If the enzyme were to employ a glutamic acid residue as acid/base catalyst, as is the case for the majority of GH3 glycosidases, there would likely be significant Coulombic repulsion with the substrate phosphate. This Glu to His substitution was proposed earlier as a modification that gave members of the sub-group of GH3 enzymes the ability to carry out phosphorolysis rather than hydrolysis (32). Recently, Ducatti et al. (41) demonstrated that although the Glu to His substitution may be necessary for phosphorylase action, it is not itself sufficient, and thus is not predictive of phosphorylase activity. They showed that a GH3 ␤-N-acetylglucosaminidase

GH3 ␤-GP identified from a metagenomic library
from Herbaspirillium seropedicae SmR1 (Hsero1941) bearing the His-Asp dyad functions predominantly as a hydrolase and observed no activity stimulation or phosphorylated products in the presence of phosphate. They also observed that Hsero1941 showed greater activity toward pNPGlcNAc (k cat ϭ 1.2 s Ϫ1 ) than pNPGlc (k cat ϭ 3.3 ϫ 10 Ϫ3 s Ϫ1 ), consistent with the fact that the residue sandwiched between Asp and His in their signature sequence is indeed Ser.

Discussion
The screen, developed and validated with three known ␤-glycoside phosphorylases, proved to be robust, simple to implement, and useful in identifying active clones within a metagenomics library. From a relatively small number of initial hits, a single ␤-glucoside phosphorylase belonging to CAZy family GH3 was discovered. This represents the first reported highthroughput functional metagenomic screen for glycoside phos-phorylases, and the approach used may well be amenable to a range of other phosphorylases. Given successful implementation of the screening paradigm on known ␤-glucoside phosphorylases from GH94, and the relatively high number of initial hits, it was somewhat surprising that GH94 enzymes were not identified in the BCR library. It is unlikely that this is due to the activity stimulation by phosphate being masked through some other step being rate-limiting, as can be seen for the retaining enzyme studied here, because GH94 phosphorylases are inverting with a single chemical step. A possible explanation why no GH94s were discovered is simply that none are present within the BCR library. This is consistent with the partial sequence information available from this library (from end-sequencing of fosmids), which contained no predicted GH94 ORFs. A second possible explanation is that those GH94s potentially present in the library have limited expression, which in many cases is due to the E. coli host RNA polymerase's inability to recognize foreign promoter sequences on the fosmid DNA (42,43). This limitation can be compensated for by equipping the host strain with additional factors that help it recognize a wider range of promoter sequences, (44); however, the host strain used in this study contained only the native E. coli factors. A third possibility is that those GH94s that are expressed are not capable of cleaving DNPGlc. To overcome this issue, a new functional screen would need to be devised that would utilize natural substrates as opposed to aryl glycosides. This could involve detection of sugar phosphate products when assayed in the degrada-  Table 3. D, histogram showing k cat /K m values for hydrolysis of pNPGlc and pNPGlcNAc by BglP and BglP-Q205S.

Table 3 Kinetic parameters for the reaction of BglP.Q205S with pNPGlcNAc, pNPGlc, and DNPGlc
Reactions were carried out in storage buffer B supplemented with 0 or 50 mM phosphate, pH 7.0, at 25°C. Molar extinction coefficients at 400 nm are as follows: pNP ϭ 7280 M Ϫ1 cm Ϫ1 and DNP ϭ 12,460 M Ϫ1 cm Ϫ1 . Parameters were calculated using the following: V 0 ϭ ͓E͔⅐k cat ⅐͓S͔/( K m ϩ ͓S͔). ND means not determined.  GH3 ␤-GP identified from a metagenomic library tive direction or oligosaccharides when assayed in the synthetic mode.

Substrate
The reason for the false positives (clones that showed phosphate-dependent DNPGlc cleavage but failed to produce Glc-1-P) is not known at this stage. The most probable is that these are glycosidases whose activity is enhanced by high phosphate levels either through simple salt effects or through some form of allosteric interaction. Indeed, apart from 29K06 and 31P01, each clone analyzed by TLC had been previously identified through functional screens targeting glycoside hydrolases and subsequently sequenced (Table S3). Given that two of the non-GP GH3 ORFs were each independently found on three separate fosmids (see Fig. S4), it is likely these are simple ␤-glucosidases whose activity is in some way stimulated by the presence of phosphate.
Although the screen of 17,168 fosmid clones identified only one new GP, this enzyme did turn out to be interesting, as this represents only the second ␤-retaining glycoside phosphorylase of any kind to be identified and characterized, and the first for which a three-dimensional structure has been determined. BglP seems to be specialized toward ␤-glucoside substrates, generating a ␤Glc-1-P product, with much lower activities against N-acetylhexosaminides than has Nag3. Interestingly, the immediately flanking gene on the fosmid that bears BglP is a ␤-phosphoglucomutase. We cloned and expressed the gene and indeed confirmed that the enzyme converts ␤Glc-1-P to glucose 6-phosphate, but it has no activity on ␤GlcNAc1P. The typical phosphate levels in bacteria of 20 -30 mM (45) should be sufficient for the two genes together to feed glucosyl moieties directly into the glycolytic pathway without the need for expenditure of any additional ATP.
Prior to our determination of the structure of BglP bound to a glucoside analog, it was puzzling that BglP and Nag3, both of which appeared to be ␤-N-acetylglucosaminidases on the basis of a sequence motif similar to that seen in other GH3 N-acetylhexosaminidases, are more active against glucoside substrates than against N-acetylhexosaminides. Mayer et al. (31) had speculated that this difference in substrate specificity (in the case of Nag3) was due to residues Asp-193 and Gln-200 because they differed from those in the conserved sequence motif (KH(FI)PGDGX 4 DQH; altered residues underlined), although at the time the group had no structures of enzymes from this sub-group and was unaware that Nag3 was a phosphorylase (31). Because BglP contains the same altered sequence motif, our structural study provided insight into the functional significance of BglP's homologous residues (Asp-198 and Gln-205). By comparing BglP's structure to that of two other NagZ structures having the fully conserved motif (KH(FI)PG(HL)GX 4 D(ST)H), we concluded that Gln-205 was likely causing steric hindrance with a C2 N-acetyl group, while at the same time the NH 2 moiety of the Gln side chain was in a good position to H-bond with a C2 hydroxyl group. Indeed, by mutating Gln-205 to a serine (the amino acid present in the fully conserved GH3 N-acetylhexosaminidase motif), the substrate preference was cleanly switched from glucosides to N-acetylhexosaminides. Thus, through the development of a novel screen for glycoside phosphorylases, a new class of enzyme has been discovered and characterized both structur-ally and mechanistically. This in turn has provided new insights into simple ways in which enzyme specificity can be encoded and swapped while opening wide a screening paradigm for the discovery of wide-ranging glycoside phosphorylases with biotechnological potential.

Protein expression and purification
C. fimi Nag3 was expressed and purified, as reported previously (32). 2 liters (RtCBP, RtCDP, and BglP.Q205S) or 4 liters GH3 ␤-GP identified from a metagenomic library (BglP) of LB media containing 25 g/ml kanamycin was inoculated with 1/100 of overnight culture. Expression cultures were grown at 37°C until A 600 ϭ 0.5 (ϳ3 h). Cells were induced with 0.5 mM IPTG and grown for an additional 3 h at 37°C (RtCBP and RtCDP) or 18 h at 16°C (BglP and BglP.Q205S). Cells were harvested by centrifuging at 6000 ϫ g for 6 min in a Beckman Coulter Avanti J-E floor centrifuge (JA-10 rotor) followed by resuspension (20 ml per 1 liter of original culture volume) in loading buffer A (RtCBP and RtCDP) or loading buffer B (BglP and BglP.Q205S) (Table S2). Cells were lysed with an Avestin C3 homogenizer with an average cell pressure of 16,000 p.s.i. The soluble fraction was isolated by centrifuging the lysate at 15,000 rpm for 30 min (JA-20 rotor). The soluble fraction was either stored at Ϫ20°C until needed or immediately purified. Purification was carried out by immobilized metal affinity chromatography on a GE Healthcare Ä KTA FPLC equipped with a UV and conductance detector and an automatic fraction collector. Buffers used for protein purification are detailed in Table S2. For each enzyme a separate 5-ml HisTrap TM FF column (GE Healthcare) was equilibrated with 10 column volumes of loading buffer A (RtCBP and RtCDP) or loading buffer B (BglP and BglP.Q205S). Soluble cell lysates were applied to the columns using a P-1 peristaltic pump (GE Healthcare) followed by a wash step of 10 column volumes (CV) of the respective loading buffers. The columns were then transferred to the Ä KTA and washed again with 10 CV of wash buffer A (RtCBP and RtCDP) or wash buffer B (BglP and BglP.Q205S) followed by equilibration with 10 CV of the respective loading buffers. Proteins were eluted using a 20-ml gradient (0 -100%) of loading buffer to elution buffer with the automatic fraction collector set to collect 1-ml fractions. Fractions were analyzed on SDS-PAGE, and those yielding the biggest band at 94 kDa for RtCBP, 113 kDa for RtCDP, or 65 kDa for BglP and BglP.Q205S were combined and concentrated using an Amicon Ultra-4 MWCO 30-kDa centrifugal filter (Sigma), then dialyzed against storage buffer A (RtCBP and RtCDP) or storage buffer B (BglP and BglP.Q205S), and stored at Ϫ70°C.

RtCBP, RtCDP, and Nag3 DNPGlc phosphorolysis assay
Spectroscopic assay-DNPGlc (2 mM) was incubated with 25 g of purified RtCBP, RtCDP, or Nag3 and 0 or 50 mM phosphate in storage buffer A (200-l reaction volume) and incubated at 37°C for 1 h. A 400 measurements were performed in matched 1-cm path length quartz cuvettes using a Varian Cary 300 Bio UV-visible spectrophotometer with an automatic cell changer and circulating water bath.
TLC assay-DNPGlc (20 mM) was incubated with 37.5 g of purified RtCBP, RtCDP, or Nag3 and 0 or 50 mM phosphate in storage buffer A (10-l reaction volume) and incubated at 37°C for 2 h. Samples (0.5 l) of each reaction were spotted onto TLC plates and run and visualized as described above.

BCR library functional screen
The screening methodology was modeled on the functional metagenomic screen reported by Mewis et al. (25). A generalized workflow of the screening method is shown in Fig. 3. BCR library plates were replicated into fresh 384-well plates (Nunc TM 384-well clear polystyrene plates, non-treated) con-taining 50 l of LB media per well with 100 g/ml arabinose and 12.5 g/ml chloramphenicol using a QPix2 robot. The replicated plates were grown for 18 h at 37°C. The assay was performed by adding 50 l of 2ϫ assay buffer S (Table S2) and incubating at 37°C for 6 h; then absorbance measurements were taken at 400 nm. 56 fosmid clones that displayed an A 400 value greater than mean ϩ 4 S.D. were re-arrayed to a 96-well GP master plate.

BCR GP master library screen and validation
GP master library screen-The BCR GP master plate was replicated to six identical 96-well plates (Costar 96-well flat-bottom polystyrene plate) containing 100 l of LB media per well with 100 g/ml arabinose and 12.5 g/ml chloramphenicol and then grown for 18 h at 37°C. The plates were then screened in the presence and absence of 50 mM phosphate, each condition in triplicate. Either 100 l of 2ϫ assay buffer P or 2ϫ assay buffer H (Table S2) was added to each well and incubated at 37°C for 6 h. After the incubation period, absorbance measurements were taken at 400 nm in a BioTek Synergy H1 hybrid microtiter 96-well plate reader. A Student's t test was performed between the averages of the A 400 values of each clone in the absence and presence of phosphate using Prism GraphPad version 6.0 software. Clones possessing a significantly (p Ͻ 0.05) higher activity with phosphate present were analyzed further by TLC.
GP TLC validation-2 ϫ 100 l of overnight cell culture from each of the 12 fosmid clones were spun down, and the supernatant was removed. Each cell pellet pair was resuspended with 10 l of 2ϫ assay buffer H, 2 l of 100 mM DNPGlc, and 8 l of distilled H 2 O or 8 l of 125 mM potassium phosphate, pH 7.0, and incubated for 2 h at 37°C. Samples (0.5 l) of each reaction condition were spotted onto TLC plates and run as described above.

Full fosmid sequencing
Fosmid DNA was extracted from clones FOS62_29K06 and FOS62_31P21 using the GeneJET plasmid miniprep kit (Thermo Fisher Scientific) according to the manufacturer's instructions. Fosmid preparations were further treated with PlasmidSafe DNase (Epicenter) to degrade contaminating E. coli chromosomal DNA. DNA concentrations were measured with Quant-iT TM dsDNA HS assay kit (Invitrogen) using a Qubit fluorometer (Invitrogen). For full fosmid sequencing, 2.4 ng of each fosmid was sent to University of British Columbia Sequencing Centre (Vancouver, Canada). Each fosmid was individually barcoded and sequenced using the MiSeq system.

Contig assembly, open reading frame prediction, and gene identification
All Illumina MiSeq raw sequence data were trimmed and assembled using a python script available on GitHub at https:// github.com/hallamlab/FabFos. Briefly, Trimmomatic was used to remove adapters and low-quality sequences from the reads (50). These reads were screened for vector and host sequences using Burrows-Wheeler Aligner (BWA) (51) and then filtered using Sam tools and a bam2fastq script to remove contaminants. These high-quality and purified reads were assembled by GH3 ␤-GP identified from a metagenomic library MEGAHIT with k-mer values ranging between 71 and 241, increasing by increments of 10 (52). Because these libraries often had in excess of 20,000 times coverage, to prevent the accumulation of sequencing errors interfering with proper sequence assembly, the minimum k-mer multiplicity was calculated by 1% of the estimated coverage of a fosmid. Outside of the python script, assemblies that yielded more than one contig were then scaffolded using minimus2 (53). Parameterized commands can be found in both documentations on the GitHub page and in the python script itself. Fosmid ORFs were identified using the metagenomic version of Prodigal (54) and compared with the CAZy database using BLASTP as part of the MetaPathways version 2.5 software package (55). Meta-Pathways parameters are as follows: length Ͼ60 bp; BLAST score Ͼ20; blast score ratio Ͼ0.4, and E value Ͻ 1 ϫ 10 Ϫ6 .

Kinetic analysis
Kinetic parameters were determined by measuring enzymatic reaction rates with the chromogenic substrates and monitoring the change in absorbance at 400 nm. Measurements were performed in triplicate in matched 1-cm path length quartz cuvettes using a Varian Cary 300 Bio UV-visible spectrophotometer with an automatic cell changer and temperature controller, at 25°C in buffer B (Table S2) containing the indicated concentration of substrate and potassium phosphate, pH 7.0. 150 l of buffer B was premixed with 20 l of 10ϫ phosphate solution and 10 l of 20ϫ substrate solution. The reaction was initiated by the addition of 20 l of 2 M BglP or BglP.Q205S for DNPGlc, 20 M BglP or BglP.Q205S for pNPGlc, or 50 M BglP or 20 M BglP.Q205S for pNPGlcNAc reaction, and the change in absorbance at 400 nm was measured for 5 min. Hydrolysis/phosphorolysis rates were calculated by measuring absorbance changes as a function of time and converting these to concentration with the following extinction coefficients: 7280 M Ϫ1 cm Ϫ1 (pNP) and 12,460 M Ϫ1 cm Ϫ1 (2,4-dinitrophenol) at 25°C, pH 7.0. Substrate final concentrations are as follows: DNPGlc: 2.5, 10, 100, 500, 1000, and 1500 M; pNPGlc: 0.5, 1, 2.3, 3.6, 5, 10, and 25 mM; pNPGlcNAc: 0.1, 0.25, 0.5, 1, 2, 5, and 10 mM. Concentrations of phosphate were chosen to encompass apparent K m values where possible. Non-linear regression was performed using GraphPad Prism version 6.0.

Pre-steady-state kinetics
Pre-steady-state phases of the reaction of BglP with DFMUGlc and (separately) DFMUGlcNAc were monitored in a temperature-controlled fluorimeter (Varian Cary Eclipse fluorescence spectrophotometer) cooled to 12°C. Excitation of 353 nm (slit length 2.5 nm) was used, and the fluorimeter was set to continuously monitor emission at 451 nm (slit length 5 nm) for the duration of the experiment. A solution containing 980 l of pre-chilled (12°C) assay buffer (50 mM HEPES, pH 7.0, 300 mM NaCl, 10% glycerol (v/v), 5 mM MgSO 4 , 0.1 mM DTT) and either 10 M DFMUGlc or DFMUGlcNAc in a 4.5-ml plastic cuvette (Fisherbrand TM disposable plastic cuvette, UV-visible, CLR SIDE, Methacrylate) were used to establish a baseline. After ϳ1 min, 20 l of BglP (2.1 M for DFMUGlc or 21 M for DFMUGlcNAc) was added to the assay buffer, and the reaction was allowed to proceed for 15 min. The fluorescence signal intensity was converted to DFMU concentration using a standard curve of the fluorophore in the same assay buffer.

Inactivation and reactivation of BglP
Inactivation-In individual 50-l reactions, 5 M BglP was combined with 0, 2, 5, 10, 20, 25, 50, and 75 mM DNP2FGlc in buffer C and incubated in a 25°C circulating water bath. At 0, 10, 30, 60, 90, and 120 min after beginning the incubation, 5 l of each reaction was transferred to a matched 1-cm path length quartz cuvette containing 50 mM pNPGlc in 195 l of buffer C and the change in absorbance at 400 nm was measured over 5 min. Turnover rates were calculated using the extinction coefficient for pNP (7280 M Ϫ1 cm Ϫ1 ), plotted against time, and fitted to a first-order expression using Prism GraphPad version 6.0 to give apparent rate constants for inactivation at each concentration of DNP2FGlc. A replot of these versus DNP2FGlc concentration was fit to a Michaelis-Menten-like expression to yield values of k i and K i .

X-ray crystallographic structure determination
To produce BglP in quantities sufficient for crystallization trials, E. coli BL21-Gold (DE3) harboring the pET28-BglP.h6 expression plasmid was grown in 500 ml of LB media supplemented with 35 g/ml kanamycin at 37°C to an A 600 of 0.5-0.6 and then induced with 1 mM IPTG and grown for an additional 18 -20 h at 16°C with shaking. The culture was pelleted by centrifugation, resuspended in lysis buffer (50 mM HEPES, pH 7.0, 500 mM NaCl, 10% glycerol (v/v), and 5 mM imidazole), and lysed using a French pressure cell press (Aminco). The lysate was centrifuged at 17,000 ϫ g for 1 h at 4°C. The supernatant was mixed with 1.5 ml of nickel-nitrilotriacetic acid resin (Qia-GH3 ␤-GP identified from a metagenomic library gen) for 1 h at 4°C, then poured into a gravity column and washed with 25 ml of lysis buffer, followed by 25 ml of lysis buffer supplemented with 25 mM imidazole. Recombinant BglP was eluted from the washed resin using lysis buffer supplemented with 250 mM imidazole and dialyzed overnight against 500 mM NaCl, 50 mM Tris-Cl, pH 7.5, 10% glycerol (v/v), and 1 mM DTT. The protein was concentrated and loaded onto a size-exclusion gel filtration column (Superdex 75) pre-equilibrated with dialysis buffer. Fractions containing BglP were pooled and concentrated using an Amicon UltraCentricon spin cartridge (Merck Millipore).
When concentrated to 5 mg/ml, BglP crystallized after 1 day in a mother liquor composed of 27% PEG 1000 and 100 mM MES, pH 6.5, by hanging-drop vapor diffusion. To obtain a complex of BglP with DNP2FGlc, crystals of the protein were soaked for 1 h with the ligand at a final concentration of 20 mM. Both native and ligand-bound BglP were subsequently flashcooled in liquid N 2 using the mother liquor above as a cryosolution.
X-ray diffraction data were collected in-house at 100 K using a Rigaku MicroMax HF X-ray generator and R-AXIS IVϩϩ image plate detector. Data were indexed using MOSFLM (57) and scaled using Aimless (58). A molecular replacement (MR) search model was generated by SCULPTOR (59) using the crystal structure of a family GH3 N-acetylglucosaminidase (PDB code 3BMX) and a pairwise sequence alignment of the search model with BglP generated in GENEIOUS 8.1.7 (60). MR was carried out using PHENIX.PHASER (61) to generate initial phase estimates for reflections collected from a crystal of BglP bound to 2FGlc, followed by model building using PHENIX. AUTOBUILD (61) and iterative model improvement using COOT (62) and PHENIX.REFINE (61). A model of 2FGlc covalently linked to BglP was fitted into its ascribed density using COOT and restrained during refinement using geometric restraints generated by PHENIX eLBOW (61). Initial phase estimates for native BglP were obtained using the refined 2FGlc-bound model from which 2FGlc and solvent had been removed as an MR search model in PHENIX.PHASER followed by iterative model building and refinement using COOT and PHENIX.REFINE.
A model of bound 2FGlc-1-P was built based on the GlcNAc-MurNAc substrate bound to a GH3 NagZ from B. subtilis (PDB code 4GYJ) and 2-deoxy-2-fluoro-␣-D-glucose-1phosphate (gfp_msd.pdb) from the HIC-up database (63). Mur-NAc was replaced in the GlcNAc-MurNAc substrate with the phosphate group of gfp while maintaining the distorted conformation of GlcNAc and pseudoaxial orientation of the glycosyl ester linkage connecting C1 of the sugar to phosphate. The 2-acetamide group of GlcNAc was then replaced with a fluorine to complete the model, which was fitted into the active site using the experimentally determined BglP-2FGlc complex as a guide.

Accession numbers
Fosmid sequences for 29K06 (MF625023) and 31P01 (MF625024) can be found in GenBank TM . Coordinate files and structure factors have been deposited to the Protein Data Bank under accession codes 5VQD for native BglP and 5VQE for BglP bound to 2FGlc, respectively.