Assignment of Sweet Almond β-Glucosidase as a Family 1 Glycosidase and Identification of Its Active Site Nucleophile*

Sweet almond β-glucosidase is a well studied glycosidase, having been subjected to numerous kinetic analyses and inhibition studies. However, it is not known to which glycosidase family it belongs, nor is the identity of the active site nucleophile known with certainty. It can be inactivated using the specific, mechanism-based enzyme inactivator 2-deoxy-2-fluoro-β-d-glucopyranosyl fluoride, which functions by forming a stable 2-deoxy-2-fluoro-α-d-glucopyranosyl-enzyme intermediate. The glycosylated peptide present in a peptic digest of this trapped glycosyl-enzyme intermediate was identified by use of neutral loss scans on an electrospray ionization triple quadrupole mass spectrometer. Comparative liquid chromatographic/mass spectrometric analysis of peptic digests of labeled and unlabeled enzyme samples confirmed the unique presence of this peptide of m/z = 1041 in the labeled sample. The sequence of this peptide was determined to be Ile-Thr-Glu-Gln-Gly-Val-Asp-Glu by further tandem mass spectrometric analysis in the daughter ion scan mode in conjunction with Edman degradation of the purified peptide. The identity of the labeled side chain was determined by further tandem mass spectrometric analysis in the daughter ion scan mode of a partially purified sample of the labeled peptide subjected to methyl esterification, the fragmentation pattern being consistent only with the first Glu in the sequence being labeled. The sequence around this residue is identical to that surrounding the catalytic nucleophile in many members of glycosidase Family 1, confirming the assignment of this enzyme to that family. The residue labeled is, however, different from that (Asp) identified previously in the enzyme from bitter almonds by use of conduritol epoxide affinity labels, although apparently close in the primary sequence.

Sweet almond ␤-glucosidase is a well studied glycosidase, having been subjected to numerous kinetic analyses and inhibition studies. However, it is not known to which glycosidase family it belongs, nor is the identity of the active site nucleophile known with certainty. It can be inactivated using the specific, mechanism-based enzyme inactivator 2-deoxy-2-fluoro-␤-D-glucopyranosyl fluoride, which functions by forming a stable 2-deoxy-2fluoro-␣-D-glucopyranosyl-enzyme intermediate. The glycosylated peptide present in a peptic digest of this trapped glycosyl-enzyme intermediate was identified by use of neutral loss scans on an electrospray ionization triple quadrupole mass spectrometer. Comparative liquid chromatographic/mass spectrometric analysis of peptic digests of labeled and unlabeled enzyme samples confirmed the unique presence of this peptide of m/z ‫؍‬ 1041 in the labeled sample. The sequence of this peptide was determined to be Ile-Thr-Glu-Gln-Gly-Val-Asp-Glu by further tandem mass spectrometric analysis in the daughter ion scan mode in conjunction with Edman degradation of the purified peptide. The identity of the labeled side chain was determined by further tandem mass spectrometric analysis in the daughter ion scan mode of a partially purified sample of the labeled peptide subjected to methyl esterification, the fragmentation pattern being consistent only with the first Glu in the sequence being labeled. The sequence around this residue is identical to that surrounding the catalytic nucleophile in many members of glycosidase Family 1, confirming the assignment of this enzyme to that family. The residue labeled is, however, different from that (Asp) identified previously in the enzyme from bitter almonds by use of conduritol epoxide affinity labels, although apparently close in the primary sequence.
The ␤-glucosidase from almonds has probably been the subject of more kinetic studies than any other glucosidase. This is a consequence not only of its long history, the enzyme being easily isolated in large quantities from a readily available material, and its commercial availability, but also the simplicity of its assay using aryl glycosides in particular. The enzyme is known to exist in a number of isoenzymic forms, which can cause some difficulties in the interpretation of kinetic studies. However, these isozymes have been separated in some cases (1,2).
Sweet almond ␤-glucosidase has long been known to hydro-lyze glycosides with net retention of anomeric configuration (3). Therefore, it presumably follows the standard mechanism of such retaining glycosidases, in which the substrate binds to an active site containing a pair of carboxylic acids (4,5). In a first step termed the glycosylation step, the glycoside is protonated by one of these acids (the acid/base catalyst) while undergoing attack at the anomeric center by the other, resulting in the formation of a covalent glycosyl-enzyme intermediate. This covalent intermediate is then hydrolyzed in a second step (the deglycosylation step) by general base-catalyzed attack of water at the anomeric center. Both steps proceed via transition states with substantial oxocarbenium ion character. Evidence in support of this mechanism has come from a number of studies, most notably from the biphasic, downward curving Hammett relationship observed between the kinetic parameters for hydrolysis of a series of aryl glycosides and the phenol leaving group ability (4,5). This clearly indicates a two-step mechanism. Evidence for the oxocarbenium ion-like transition state also arises from the impressive inhibition afforded by transition state analogue inhibitors such as gluconolactone (6), as well as numerous measurements of secondary deuterium kinetic isotope effects (4,5). This enzyme has also been the subject of a number of studies with irreversible inactivators. Thus, ␤-D-glucosylisothiocyanates were tested early on as inactivators of this enzyme, presumably functioning via affinity labeling of nucleophilic active site residues (7). The most useful insight has come from important, early studies by Legler using conduritol epoxides as active site-directed affinity labels (8). These reagents are polyhydroxylated cyclohexanes containing a reactive epoxide functionality. Because they mimic a simple glycoside, they presumably bind at the active site, then become activated by protonation of the epoxide, and suffer attack by a suitably disposed enzymic nucleophile. In a series of careful studies on sweet almond enzymes A and B, Legler showed stoicheometries of binding of 1.7 and 1.0 for isozymes A and B, respectively, according to different kinetic parameters. He also showed that the stereochemistry of the cyclitol product released upon cleavage of the cyclitol peptide ester with hydroxylamine was consistent with trans-diaxial opening of the epoxide moiety. Similar results were obtained with the enzyme from bitter almonds. In addition, in that case, he was able to isolate and sequence the derivatized peptide by use of radioactive labels (9). In this way, he identified a labeled aspartate residue, and quite reasonably assigned this the role of active site nucleophile. Unfortunately, it has been established recently (10 -12) that, although these conduritol epoxides do indeed label active site residues, in a number of cases the residue labeled has not actually been the catalytic nucleophile, but some other carboxylic acid. In at least one case, the residue labeled was subsequently shown to be the acid/base catalyst (10). However, this is not always the case, as has been shown recently (13). It was therefore necessary to re-investigate the identity of the active site nucleophile in sweet almond ␤-glucosidase. An additional benefit of such a study might be the assignment of the enzyme to one of the sequence-related families of glycosidases delineated by Henrissat (14 -16), inasmuch as sequences around the active site nucleophile are typically highly conserved and unique for the family.
The reagents that have made the unequivocal identification of the active site nucleophiles in retaining glycosidases possible are the 2-deoxy-2-fluoro glycosides with activated leaving groups, most commonly the 2,4-dinitrophenyl glycoside, or the glycosyl fluoride (12,17), which function by forming stabilized glycosyl-enzyme intermediates. The fluorine at the 2-position destabilizes the oxocarbenium ion-like transition states for formation and hydrolysis of the intermediate, while the presence of a good leaving group ensures that the formation of the intermediate is faster than hydrolysis. The trapping of the covalent intermediate in a ␤-glucosidase and a cellulase has been verified by 19 F NMR (18) and even by x-ray crystallography (19). Identification of the catalytic nucleophile is then made possible by proteolytic digestion of the labeled enzyme, followed by HPLC 1 separation of the resultant peptides and localization of the labeled peptide. This can be achieved either by use of a radiolabeled inactivator, or preferably by use of electrospray ionization tandem mass spectrometry to detect specific fragmentations associated with such a glycosylated peptide (11,12,20). The active site nucleophiles of a number of glycosidases have now been identified by these means (12,21,22). This article describes the application of these techniques to the identification of the active site nucleophile in sweet almond ␤-glucosidase.
Enzyme Kinetics-Kinetic studies were performed at 25°C at pH 5.6 in a buffer system containing 20 mM sodium acetate, 10 mM PIPES, 0.1 mM EDTA, and 0.1% bovine serum albumin. A continuous spectrophotometric assay based on the hydrolysis of PNPGlc was used to monitor enzyme activity by measurement of the rate of 4-nitrophenolate release ( ϭ 400 nm, ⑀ ϭ 7.28 ϫ 10 3 M Ϫ1 cm Ϫ1 in the buffer above) using a PU-8800 UV-visible spectrophotometer equipped with a circulating water bath.
The inactivation of ␤-glucosidase by 2F␤GlcF was monitored by incubation of the enzyme (ϳ0.04 mg⅐ml Ϫ1 ) under the above conditions in the presence of various concentrations of the inactivator. Residual enzyme activity was determined at appropriate time intervals by addition of an aliquot (5 or 10 l) of the inactivation mixture to a solution of PNPGlc (1 mM, 600 l) in the above buffer, and measurement of nitrophenolate release.
Electrospray Mass Spectrometry-Mass spectra were recorded on a PE-Sciex API 300 triple quadrupole mass spectrometer (Sciex, Thornhill, Ontario, Canada) equipped with an Ionspray ion source. Peptides were separated by reverse phase HPLC on an Ultrafast Microprotein Analyzer (Michrom BioResources Inc., Pleasanton, CA) directly interfaced with the mass spectrometer. In each of the MS experiments, the proteolytic digest was loaded onto a C18 column (Reliasil, 1 ϫ 150 mm), then eluted with a gradient of 0 -60% solvent B over 60 min, followed by 100% B over 2 min at a flow rate of 50 l/min (solvent A: 0.05% trifluoroacetic acid, 2% acetonitrile in water; solvent B: 0.045% trifluoroacetic acid, 80% acetonitrile in water). Spectra were obtained in either the single-quadrupole scan mode (LC/MS), the tandem MS neutral loss scan mode, or the tandem MS product ion scan mode (LC/MS/MS).
In the single quadrupole mode (LC/MS), the quadrupole mass analyzer was scanned over a m/z range of 300 -2400 Da with a step size of 0.5 Da and a dwell time of 1 ms/step. The ion source voltage (ISV) was set at 5 kV, and the orifice energy (OR) was 50 V.
In the neutral loss scanning mode, MS/MS spectra were obtained by searching for the mass loss of m/z 165, corresponding to the loss of the 2FGlc label from a peptide ion in the singly charged state. Thus, scan range: m/z 300 -2400; step size: 0.5; dwell time: 1 ms; ion source voltage (ISV): 5 kV; OR: 45; Q0 ϭ Ϫ10; IQ2 ϭ Ϫ49. To maximize the sensitivity of neutral loss detection, normally the resolution is compromised without generating artifact neutral loss peaks.
Methyl Esterification of Partially Purified Peptide-Partially purified peptide (control or labeled) was mixed with a freshly prepared solution of 2 M methanolic HCl. The mixture was incubated at room temperature for 30 min. The excess reagent was removed by centrifugation under vacuum, and then the esterified product was dissolved in 50% methanol with 5% acetic acid.

RESULTS AND DISCUSSION
Inactivation of the sweet almond ␤-glucosidase by 2F␤GlcF occurred according to a relatively complex kinetic pattern as shown in Fig. 1. Inactivation proceeded to completion, but did not follow a simple first order process. The most likely reason for this complex kinetic behavior is the presence of at least two isozymes with different kinetic parameters within the enzyme sample, as has been reported previously (1,2).
Because the amino acid sequences at the active sites of these two isozymes are likely identical, given the highly conserved nature of active sites within families, it seemed probable that a single, labeled peptide would be obtained upon proteolytic digestion of the inactivated isozyme mixture, avoiding the need to separate the two isozymes. Peptic digestion was chosen, given the known lability of esters at pH values above 6 and given the stabilization toward tryptic digestion at neutral pH afforded by the trapped intermediate. Use of pepsin also has the advantage of producing short peptides, thereby minimizing the probability of sequence differences between the two isozymes within that region.
A peptic digest of the labeled enzyme was subjected to purification by microbore reverse phase HPLC, with monitoring of the eluent by on-line electrospray ionization mass spectrometry. The chromatogram obtained is shown in Fig. 2a. The location of the labeled peptide within this chromatogram was determined by means of a neutral loss scan in the tandem mass spectrometer (11,12). In this mode, the peptide ions are eluted into the tandem mass spectrometer and subjected to collision assisted dissociation in the collision cell of the instrument. The ester bond between the sugar and the peptide is one of the more labile linkages present, and under these conditions the fluorosugar is cleaved off as a neutral species, leaving the peptide with its original charge. The two quadrupoles are therefore scanned in a linked mode, so that only those ions which decrease in mass by the mass of the lost label after passage through the collision cell can be detected. For a singly charged peptide, this m/z difference is the mass of the label (m/z ϭ 165).
Scanning in the neutral loss mode for the mass loss 165 from a singly charged peptide revealed one major peak (Fig. 2b) in the 2FGlc-labeled digest that was absent in an unlabeled, control digest (Fig. 2c). This clearly indicates that the peptide eluting at 23.1 min of m/z ϭ 1041 (Fig. 2d) is the peptide of interest. The unlabeled peptide must therefore have a mass of 876 (1041-165) and therefore contains approximately 7-9 amino acid residues. LC/MS analysis of the control digest reveals a peptide of this same mass (876) eluting at 23.2 min, whereas no peptide of this mass is observed in the peptic digest of the labeled enzyme, nor is a peptide of mass 1041 found in peptic digests of the unlabeled protein. These results clearly indicate that this is indeed the peptide of interest.
The amino acid sequence of this labeled peptide was determined using a combination of Edman degradation of the purified peptide and tandem mass spectrometric fragmentation analysis. The peptide was purified by collecting the sample eluting around 23 min, then subjecting this to further HPLC purification as outlined under "Experimental Procedures." Edman degradation of the sample succeeded in identifying the first 4 residues of the peptide as Ile-Thr-Xaa-Asn; unfortunately, no conclusive sequence information beyond that point was obtained. The peptide was then subjected to sequence analysis by tandem mass spectrometry. A sample of the partially purified peptide of mass 1041 obtained from the peptic digest of the labeled enzyme was infused into the mass spectrometer and this peptide selected in the first quadrupole, subjected to fragmentation at an increased collision gas energy, and the masses of the fragments so derived measured by scanning the third quadrupole. The results of this analysis are shown in Fig. 3. The parent, labeled peptide (m/z ϭ 1041) and peptide that has lost sugar (m/z ϭ 876) are easily seen along with many other fragments. These were analyzed as is shown in Fig. 3 to yield the sequence Ile-Thr-Glu-Asn-Gly-Val-Asp-Glu, consistent with the mass of 876 for the unlabeled peptide, as shown. The N-terminal portion of this sequence could not easily be obtained a priori from this fragmentation pattern, but fortunately the Edman analysis was clear in this region.
The presence of three side chain carboxylic acids plus the C-terminal carboxylic acid group was confirmed by subjecting the peptide to acid-catalyzed methyl esterification in methanol, followed by mass analysis. The mass measured for this esterified peptide (932) is 56 Da greater than that of the parent peptide, consistent with the formation of four methyl esters. Equivalent treatment of the purified labeled peptide resulted in two new peptides, one of mass 1083 corresponding to the addition of three methyl esters and one of mass 932, which arises from trans-esterification at the position of labeling in addition to ester formation at the other three positions. This confirms the attachment of the label to one of the carboxylic acids.
The identity of the labeled amino acid within this peptide was determined by MS/MS analysis of the methylated glycopeptide (m/z ϭ 1083). As shown in Fig. 4, loss of the neutral sugar moiety occurs readily yielding a peptide of m/z 918. Further fragmentation revealed that both the C-terminal glutamic acid and the penultimate aspartic acid were present as their methyl esters, as revealed by loss of 175 (Glu ϭ 147 ϩ 2 ϫ methyl ester ϭ 28) and 129 (Asp ϭ 115 ϩ methyl ester ϭ 14), leaving fragments of 743 and 614. Observation of such fragmentations requires that the glutamic acid residue near the N-terminal end of the peptide, within the sequence Ile-Thr-Glu-Asn-Gly, is the site of labeling.
This sequence is identical to that found around the active site nucleophile in the Family 1 ␤-glucosidase from Agrobacterium faecalis, and highly conserved within that family (26). This indicates that the sweet almond ␤-glucosidase belongs to Family 1, a finding consistent with the presence of a number of other plant ␤-glucosidases within this family. It is also consistent with the fact that many of the enzymes within this family have both ␤-glucosidase and ␤-galactosidase activity, as also does the sweet almond enzyme (1,27,28). Interestingly this same sequence (Ile-Thr-Glu-Gln-Gly) is contained within the N-terminal end of the sequence originally identified in the conduritol epoxide-labeled bitter almond enzyme (9), the complete sequence being Ile-Thr-Glx-Glx-Gly-Val-Phe-Gly-Asp-Ser-Glx-(Ala-Asx 2 -Pro)-Lys. The conduritol label was found to be attached to the underlined Asp. This would indicate that, not surprisingly, the bitter almond enzyme is also a member of Family 1.
The three-dimensional structure of the clover ␤-glucosidase, a Family 1 enzyme, has been determined recently (29). The sequence of the bitter almond ␤-glucosidase peptide was aligned with the sequence of the clover ␤-glucosidase and the available, three-dimensional structure inspected for the location of the residue labeled by the conduritol epoxide. The labeled Asp was found some 10 Å away from the active site nucleophile on the entrance to the tunnel leading into the active site. Thus, it appears either that the conduritol epoxide binds to and tags a subsite or that, after protonation in the active site, the epoxide diffuses outward and reacts with the Asp residue. This, therefore, provides another example of the nonspecificity of labeling with conduritol epoxides and provides some insight into how this occurs.
Conclusion-On the basis of the sequence of the peptide derived from a trapped glycosyl-enzyme intermediate, the sweet almond ␤-glucosidase is assigned to glycosidase Family 1, with its active site nucleophile contained within the sequence Ile-Thr-Glu-Asn-Gly. Prior observation (9) of an identical sequence within a labeled peptide from the bitter almond ␤-glucosidase similarly assigns that enzyme to Family 1.