The endogenous galactofuranosidase GlfH1 hydrolyzes mycobacterial arabinogalactan

Despite impressive progress made over the past 20 years in our understanding of mycolylarabinogalactan-peptidoglycan (mAGP) biogenesis, the mechanisms by which the tubercle bacillus Mycobacterium tuberculosis adapts its cell wall structure and composition to various environmental conditions, especially during infection, remain poorly understood. Being the central portion of the mAGP complex, arabinogalactan (AG) is believed to be the constituent of the mycobacterial cell envelope that undergoes the least structural changes, but no reports exist supporting this assumption. Herein, using recombinantly expressed mycobacterial protein, bioinformatics analyses, and kinetic and biochemical assays, we demonstrate that the AG can be remodeled by a mycobacterial endogenous enzyme. In particular, we found that the mycobacterial GlfH1 (Rv3096) protein exhibits exo-β-d-galactofuranose hydrolase activity and is capable of hydrolyzing the galactan chain of AG by recurrent cleavage of the terminal β-(1,5) and β-(1,6)-Galf linkages. The characterization of this galactosidase represents a first step toward understanding the remodeling of mycobacterial AG.

The mycobacterial cell envelope exhibits remarkable complexity and provides a permeability barrier against drugs as well as the host's immune attack. Despite being indexed as Grampositive bacteria, the mycobacterial cell-wall presents characteristics reminiscent of Gram-negative bacteria, such as an outer membrane and, thus, differentiates the Mycobacterium genus from the majority of other prokaryotic organisms.
From the inside to the outside, the mycobacterial cell envelope is constituted of a plasma membrane enveloped by the mycolylarabinogalactan-peptidoglycan (mAGP) 3 complex. The mAGP is composed of a central highly branched arabinogalactan (AG) polysaccharide linked to the peptidoglycan at the reducing end and to the long-chain mycolic acids at the nonreducing end. An outer membrane, also called the mycomembrane, analogous but dissimilar in its constitution to that of Gram-negative bacteria, contains in its inner leaflet mycolic acid residues covalently attached to arabinogalactan (1). In Mycobacterium tuberculosis, the outer membrane is further composed of numerous lipids and noncovalently bound glycoconjugates such as lipomannan, lipoarabinomannan, diacylglycerol, glycopeptidolipids, phenolic glycolipids, sulfolipids, trehalose dimycolate, phthiocerol dimycocerosate, di-and pentacyl trehaloses, and triacylglycerols, the relative amounts of which vary between species (2). mAGP represents an essential and major constituent of the envelope (3). In this composite molecule, AG is the polysaccharide providing the physical link between the PG and the mycolic acids producing a viscous hydrophilic region (4). Although uncertainties regarding some structural features remain, the overall architecture of the AG is now well-established, especially in the case of Mycobacterium tuberculosis (5,6).
Unlike other bacterial polysaccharides, the primary structure of AG is not based on the repetition of a short unit but is rather composed of a linker to the PG and a galactan chain substituted by several arabinan chains. In mycobacteria, AG is linked from the reducing end to the PG via a disaccharide-phosphate unit ␣-L-Rhap-(1,3)-D-GlcpNAc(1,P) leading to the presence of about 1.3 molecules AG/10 repeating units of PG and resulting in the substitution of 10 -12% of the N-glycolylmuramic acid (MurNGlyc) residues via a phosphodiester bond. From there extends the galactan moiety, a linear chain with alternating 5and 6-linked ␤-D-Galf residues with a length estimated of around 25 residues (5). Arabinan chains are attached via the O-5 of the ␤-(1,6)-Galf residues at positions 8, 10, and 12 of the galactan chain although it has been proposed recently that only two of the three positions are substituted by arabinan chains; however, the exact positions of these two attached arabinan units remain unknown (5,6). These arabinan chains consist of a backbone of ␣-5-linked ␣-D-Araf residues with several ␣-(1,3)linked residues forming 3,5-Araf branching points. The nonreducing end forms a hexa-arabinoside motif, (␤-D-Araf-(1,2)-␣-D-Araf)2-3,5-␣-D-Araf(1,5)-␣-D-Araf, esterified with MAs on position 5 of the terminal ␤-D-Araf and the terminal 2-␣-D-Araf residues.
The structure of the mycobacterial cell envelope is expected to undergo numerous modifications, especially during cell growth, division, under antibiotic stress conditions, and also during infection of the host. It is very likely that survival of mycobacteria during infection is highly dependent on the plasticity of its cell envelope (7). Remodeling this envelope is a key mechanism that, for example, M. tuberculosis uses to adapt to the changing environment of the host by heavily modifying the PG, AG, and mycolic acids.
Although AG is often considered as a polysaccharide with a rigid structure, its structure is potentially amenable through the action of specific glycosidases, either by conventional hydrolysis or by transglycosylation. This hypothesis is supported by the observation of a yet unidentified endoarabinase activity able to cleave the arabinan chain in Mycobacterium smegmatis (8) and by the discovery of the extractable lipid dimycolyl diarabinoglycerol (DMAG) in the cell envelope of Mycobacterium bovis BCG (9). In this context, DMAG can also be viewed as the transglycosylation product of a mycolylated arabinan moiety onto glycerol by the action of a transglycosidase.
In this work, we sought to identify glycosidases specific to mycobacterial AG that could dynamically modify its structure. We present the first identification and characterization of a galactofuranohydrolase in M. tuberculosis (GlfH1), which acts processively on the galactan chain of AG. This enzyme has unusual characteristics among those with similar functions in its ability to cleave two types of glycosidic linkages and the absolute Ca 2ϩ need for functional activity. GlfH1 is also an unusual member of the GH5_13 CAZy subfamily, which comprises almost exclusively glucosidases and mannosidases.

Identification of a novel galactosidase from M. tuberculosis
The seminal observation of arabinan-degrading activity in M. smegmatis introduced the idea that mycobacteria may produce a set of specific glycosidases designed to remodel endogenous cell wall polysaccharides (8). Hence, we initiated the search of such enzymes by looking for galactosidases capable of cleaving the galactan chain of AG in M. tuberculosis. To initiate this work, a first search in the CAZy database (February 2019 update) failed to identify possible galactofuranosidases out of the 31 predicted glycosidases of M. tuberculosis. Based on the known similarity between D-galactosidase and L-arabinase activities (10) and the structural similarity between L-Araf and D-Galf, we have used ␣-L-arabinofuranosidases as a template to search for potential ␤-D-galactofuranosidases (EC 3. Among all these families, only the GH family 3 possesses two members, Rv0186 (NP_214700.1) and Rv0237/LpqI (YP_177702.1) (11) within M. tuberculosis H37Rv annotated ORFs. A third potential galactofuranosidase candidate, Rv3096/GlfH1 (NP_217612.1) was eventually found in the GH5 family, which comprises ␤-xylanases (EC 3.2.1.-) often sharing sequence similarities and activity with ␣-L-arabinofuranosidases (12)(13)(14). Among these three candidates that were initially produced as recombinant proteins in E. coli and purified, Rv0186 was not able to be expressed correctly (Fig. S1) and Rv0237/LpqI did not show any activity using synthetic substrate (Table S1) or M. bovis BCG AG. However, Rv3096, subsequently designed GlfH1, showed a glycosidic activity toward the galactan moiety of M. bovis BCG AG by releasing free galactose (Fig. S2). This protein was selected for further analyses.
The Rv3096 gene is composed of 1140 bp encoding a 379amino acid residue protein predicted to contain a 27-amino acid signal peptide by the SignalP-5.0 server (15). The molecular weight of GlfH1 without the signal peptide was calculated at 40,382 Da. Sequence analysis by SSDB showed a Pfam Cellulase GH5 motif spanning residues 150 to 282 (Fig. 1A). GlfH1 was classified in family GH5_13 in the CAZy database. Surprisingly, to this date, this family only contains glycosidases that use glucose or mannose-containing polysaccharides as substrates e.g. mannosidases, mannanases, cellulases, xylanases (EC 3.2.1.-) and so on. All members of this CAZy family are ␤-retaining glycosidases.
Based on multiple sequence alignments of 13 different mycobacterial members from the GH5_13 family generated by the MUSCLE algorithm, all 13 sequences exhibited a high similarity with a maximum of 36% in the Hamming dissimilarity test (56% of the residues are conserved between all the sequences in the alignment) (Fig. S3). Among these sequences, 15 aspartic acid residues and 7 glutamic acid residues are conserved, suggesting that these amino acids are candidates for the pair of catalytic acid residues required for the glycosidase activity.
All 13 GH5_13 mycobacterial glycosidase sequences analyzed by PHYLIP Neighbor Joining phylogenetic analysis were placed into three major clades following the genome-based classification of Mycobacterium based on the concatenated sequence of 1941 core proteins from this genus (16), which are the Mycobacterium abscessus-chelonae clade, the Mycobacterium fortuitum-vaccae clade, and the Mycobacterium tuberculosis-simiae clade (Fig. 1B). A maximum-likelihood analysis for the same sequences confirms this observation (Fig. S4) (17).
In the subfamily GH5_13, the crystal structure of several glycosidases has been solved and the two catalytic amino acid residues, the first residue allowing the nucleophilic attack and the second operating in the acid/base transition, have been identified (18 -20). Of note, proteins of the GH5_13 family harbor two glutamic acid residues as catalytic amino acids in the CAZy GlfH1 hydrolyzes mycobacterial galactan database. From there, a multiple alignment by MUSCLE with the sequence of GlfH1 using these proteins makes it possible to highlight two candidates, Glu-170 and Glu-268, as acid/base and catalytic nucleophile residues, respectively (Fig. 1C). His-242 is also conserved among these sequences. This residue is essential for the activity of GH5 glycosidases (21) and, in the case of GlfH1, could serve to stabilize the Ca 2ϩ ion necessary for its activity.

Production and purification of GlfH1
The protein GlfH1 of M. tuberculosis was produced in E. coli BL21 as a fusion protein using the pET32a expression vector (sequencing of vector pET32a-Rv3096 present in Fig. S3). Initial attempts at producing GlfH1 in E. coli showed that the fusion protein was mainly produced as inclusion bodies. Indeed, GlfH1 was found in large quantities in the insoluble fraction but barely observable in the soluble fraction (Fig. S6). As the main causes of inclusion body formation are the high rate of translation and folding of proteins in E. coli (10-fold higher than other observed eukaryotic cells) (22), dedicated Auto-Induction Media was thus used to minimize this phenomenon as much as possible, as described in "Experimental procedures." AIM was formulated to allow the culture of isopropyl 1-thio-␤-D-galactopyranoside-inducible expression strains, initially without induction, and then to induce the production of the recombinant protein. The detergent N, N-dimethyldodecylamine N-oxide (LDAO) was used during the extraction step to stabilize the soluble fraction of GlfH1 and prevent it from aggregating once extracted. A three-step chromatography protocol was performed for purification ( Fig. S7), the first one being the most important for GlfH1 isolation. The chaperone protein of E. coli associated with GlfH1 was finally washed away using ATP incubation. The second step served to concentrate the fractions obtained previously because it appeared that GlfH1 could not be concentrated by an ultrafiltration membrane. A final gel-filtration step was used to remove imidazole from the eluent and to recover GlfH1 directly into its storage buffer. The final purity of GlfH1 was confirmed

GlfH1 hydrolyzes mycobacterial galactan
by SDS-PAGE (Fig. S8). This optimized purification pipeline enabled to reach a maximum yield of 2 mg/liter of active enzyme, and the minimum yield is 0.2 mg/liter of active enzyme.
All the enzymatic parameters (initial rate kinetics, optimum pH, and specificity) of GlfH1 were measured on three forms of the protein: The total form of the expressed protein (with signal peptide and tags), without the signal peptide, and without the tags. For the three forms, the results obtained are similar (Table  S2). The results presented in this work were obtained with the total form of the GlfH1.

GlfH1 is a ␤-retaining exo-␤-D-galactofuranohydrolase
To determine the substrate specificity of GlfH1, its hydrolytic activity was measured against synthetic pNP-glycoside substrates differing by the nature of their monosaccharide but also by their anomeric stereochemistry and ring size ( Fig. 2A).
Of the 12 synthetic pNP-glycoside substrates tested, only pNP-␤-D-Galf was hydrolyzed by GlfH1, confirming that it specifically hydrolyzes ␤-D-Galf. Importantly, GlfH1 showed no activity toward pNP-␣-L-Araf although these activities are very often associated for other galactofuranosidases (23). Following initial characterization of the enzymatic activity, large quantities of pNP-␤-D-Galf were synthesized through a new synthetic route to further characterize enzyme parameters. Synthesis was achieved in six steps from commercially available D-galactose (see supplemental experimental procedures). The analytical data ( 1 H and 13 C NMR spectroscopy and MS) corresponded to those reported in the literature (24 -26).
To characterize the stereochemical specificity of a glycosidase, one must take into account the spontaneous interconversion between the two anomeric forms during the release of the monosaccharide. For a hexose like glucose, the rate of mutarotation is 0.015 min Ϫ1 (27), which is incompatible with the usual structural analysis techniques. One approach consists of using methanol as a nucleophile instead of water to produce a methyl glycoside that does not undergo immediate mutarotation. GlfH1 retained 80% of its activity in the presence of 20% methanol (v/v) for short incubation times (e.g. 15 min, Fig. S9). This property was used to release 1-O-methyl Galf using pNP-Galf as the substrate. The reaction products were then derivatized by per-trimethylsilylation and analyzed by GC/MS analysis (Fig. 2B, lower chromatogram) and compared with free galactose treated in the same way (Fig. 2B, upper chromatogram). For the galactose standard, peak assignment was performed by considering the literature data for the ratio of the four galactose isomers (␣f, ␣p, ␤f, and ␤p) (28) and MS analysis to assign the ring size (Fig. S10). Hydrolysis of pNP-Galf by GlfH1 in the presence of 20% methanol and subsequent per-trimethylsilylation produces a compound with the same retention time and fragmentation profile as 1-O-methyl-␤-Galf. This establishes that the hydrolysis of a ␤-linked galactose by GlfH1 retains the anomeric stereochemistry of the monosaccharide and that the enzyme is a ␤-retaining glycosidase.

Initial rate kinetics, enzymatic requirement, and inhibition
The effect of pH on GlfH1 activity was studied from pH 3.0 to 9.0 using two overlapping buffer ranges. The optimum pH for GlfH1 enzymatic activity was established to be 4.5 (Fig. 3A). The impact of several divalent cations, i.e. Mn 2ϩ , Mg 2ϩ , Ca 2ϩ , Ni 2ϩ , and Zn 2ϩ on enzyme activity was assessed using their respective chloride salts. The activity of GlfH1 decreased in the presence of Ni 2ϩ and Zn 2ϩ by 76 and 86%, respectively (Fig. S11). However, with Ca 2ϩ , an increase of 35% in GlfH1 activity was observed. To better understand the role of Ca 2ϩ on GlfH1 activity, the enzyme was incubated in the presence of EDTA or EGTA to deprive it of divalent metal salts. In both cases, the activity of GlfH1 was significantly diminished (Fig. 3B). On the other hand, the activity of GlfH1 in the presence of EDTA and EGTA was fully restored when Ca 2ϩ was added thus confirming the requirement of this metal ion for enzymatic activity. Next, the kinetic constants were determined in the presence of 0 -5 mM pNP-␤-D-Galf under the standard activity assay con- The enzymatic activity of GlfH1 on a synthetic substrate that mimics the structure of its natural substrate, the galactan moiety of AG, was next assayed. To do so, a homologous series of oligosaccharides of 5 to 12 Galf residues linked to a ␣-(Rhap-(1,3)-␣GlcpNAc-C8 motif (Lin-Galf x ), with alternating ␤- (1,5) and ␤-(1,6) glycosidic linkages (Fig. 4A), synthesized previously (29), were used as substrates for GlfH1. The Lin-Galf x nomenclature is used with x as the total number of Galf residues and Lin for the presence of the disaccharide linker ␣-L-Rhap-(1,3)-␣-D-GlcpNAc-C8. Enzymatic hydrolysis of these oligosaccharides was carried out using standard conditions, with incubation times ranging from 2 to 120 h. GlfH1 was added regularly to maintain a significant hydrolytic activity. As shown in Figs. S12-S16, partial hydrolysis of oligosaccharides can be detected from 2 h of incubation with all tested compounds. Within 24 h of incubation, the initial oligosaccharides almost completely disappeared to give oligosaccharides of lower degree of polymerization. This is exemplified for Lin-Galf 12 (Fig. 4B), which finally generated low DP oligosaccharides (Lin-Galf 2-4 ) as the major products together with free galactose after 120 h of incubation. All oligosaccharides were degraded, irrespective of the nature of the Gal residue at the nonreducing end, either ␤- (1,5) or ␤- (1,6). The nature of released oligosaccharides was then confirmed by MALDI-TOF, as shown in Fig. 4C for Lin-Galf 12 . Overall, these results confirm the nature of GlfH1 as an exogalactofuranohydrolase that degrades mycobacterial galactan through the sequential cleavage of alternating ␤-(1,5) or ␤-(1,6) residues from the nonreducing end of the molecule.

Mycobacterial AG is the endogenous substrate of GlfH1
The exquisite substrate specificity of GlfH1 toward ␤-D-Galf strongly suggests that this enzyme may be involved in the remodeling of endogenous cell wall arabinogalactan. To define GlfH1's activity on this substrate (Fig. 5A), soluble AG, AGP, and mAGP purified from M. bovis BCG were used under standard conditions. GlfH1 was added regularly to maintain a significant hydrolytic activity. The enzymatic reaction was analysis by high performance anion exchange chromatography (HPAEC) and GC/MS. As shown on the HPAEC chromatogram (Fig. 5B), the release of free Gal residues following the GlfH1 treatment of AG, AGP, and mAGP was clearly observed between 5 and 7 min. However, GlfH1 preferentially uses AGP compared with AG and mAGP (Fig. 5B). The release of Gal from AGP was faster than from AG or mAGP under standard conditions (release efficiency AGP/AG/mAGP 5.6: 1: 0.13). The nature of released monosaccharide as Gal was confirmed for all conditions by GC/MS analysis (Figs. S17-S19).
To further define the activity of GlfH1 on a mycobacterial endogenous substrate, soluble AG was incubated with and without GlfH1 for 48 h using standard conditions. The reaction mixtures were lyophilized, exchanged twice in deuterium oxide and subjected to high field 1 H and 13 C NMR spectroscopy analyses. As seen in the 1D 1 H-NMR spectrum (Fig. 5C), the major residues in the nontreated AG include 2-, 3-, and 3,5-linked ␣-Araf, terminal ␤-Araf, 5-and 6-linked ␤-Galf residues (according to literature and further confirmed by 1 H/ 1 H-  ). B, effect of calcium. ϩ EDTA and ϩ EGTA: GlfH1 was pre-incubated with 10 mole EDTA or EGTA for 10 min at 37°C before measuring the residual activity of GlfH1 according to the standard protocol. EDTA 3 Ca 2ϩ and EDTA 3 Ca 2ϩ : GlfH1 was pre-incubated with 10 mole EDTA or EGTA for 10 min at 37°C, then 50 mM of CaCl 2 were added. Residual activity was measured according to the standard protocol. C, Galf-ase activity was quantified using pNP-␤-D-Galf as a substrate. Enzyme assays were carried out at 37°C with 0.5 g of enzyme in 20 mM citrate, pH 4.5, citrate 20 mM buffer in a total volume of 100 l.

GlfH1 hydrolyzes mycobacterial galactan
TOCSY and 1 H/ 13 C-HSQC analyses) (6) NMR analysis of GlfH1-treated AG showed a similar pattern with two significant differences. First, a clear H1 signal appeared at ␦ 4.57 ( 1,2 J H , H ϭ 7.8 Hz) that was associated with a C1 signal at ␦ 97.8. This species was subsequently identified as free ␤-Galp, according to H/C correlations (H2/C2, H3/C3, and H4/C4 at ␦ 3.48/73.2, 3.64/74.1, and 3.93/71.7, respectively, Fig. 5C) and coupling constants ( 2,3 J H,H ϭ 9.9 Hz, 3,4 J H,H ϭ 3.5 Hz, and 4,5 J H,H Ͻ 1 Hz). A more discreet 1 H/ 13 C signal was also observed at 5.25/ 93.6 ppm, which was assigned as a ␣-Galp residue, although its spin system could not be entirely established because of lower quantity and overlapping signals (Fig. 5D). The occurrence of these signals confirmed the liberation of free Galp following the GlfH1 treatment. The observation of a ␤-Galp as the major species and ␣-Galp as the minor species and no free Galf is in agreement with the known equilibrium of Gal in solution (30/ 64/2.5/3.5 for ␣p/␤p/␣f/␤f at 31°C) (28). Second, the appearance of free Gal signals was concomitant with the specific decrease of the signal intensities of both -5-␤-Galfand -6-␤-Galf-anomer signals at ␦ 5.02 and 5.23, respectively in the 1D 1 H-NMR spectrum (Fig. 5C). Relative quantification of these two signals versus all other anomer signals showed an overall 25% reduction of both signals compared with nontreated AG. It is noteworthy that the signal intensities of all Ara residues did not show any variation between each other, further confirming that the arabinan moiety of AG remains unaffected by GlfH1.

Deletion in Mycobacterium smegmatis of glfH1 (MSMEG_5877) leads to an attenuated growth phenotype in an amoebal infection model
To investigate the role of GlfH1 in bacterial physiology and infection, we aimed to construct a deletion mutant of the M. smegmatis GlfH1-closest orthologue, which was identified as MSMEG_5877 by a BLAST search limited to M. smegmatis mc 2 155 (taxid: 246196). The global alignment of MSMEG_ 5877 sequence with the sequence of GlfH1 exhibits identity and a similarity score of 73.3 and 82.1%, respectively (Fig. 6A). The three amino acid residues essential for GlfH1's activity (Glu-170, His-242 and Glu-268) are thoroughly conserved.
To this effect, we first cloned MSMEG_5877 into pET32a to purify the protein from E. coli. The enzymatic parameters of GlfH1 and MSMEG_5877, including K m and k cat , optimum pH and Ca 2ϩ requirement were similar (Fig. 6B), strongly supporting the fact that both enzymes share similar functions in M. smegmatis and M. tuberculosis.

GlfH1 hydrolyzes mycobacterial galactan
A deletion mutant of glfH1 was therefore constructed in M. smegmatis using mycobacterial recombineering which resulted in the replacement of MSMEG_5877 with a hygromycin-resistant cassette (Fig. 6C, upper panel). Putative glfH1::hyg colonies were genotypically analyzed using a combination of a PCR and Sanger sequencing analysis in which the junctions between the native M. smegmatis chromosome, cloned sequences left and right of MSMEG_5877 and the hygromycin cassette were confirmed (Fig. 6C, lower panel). In vitro growth of glfH1::hyg was not different from that of WT or complemented strains in liquid or in solid media (Fig. S20a). However, we observed a marginal increase in sensitivity of the mutant to rifampicin, but not to imipenem (Fig. S20b). Considering the hydrophobicity of rifampicin, we speculated that the slight increase in sensitivity to this antibiotic indicated a defect in the cell wall of the bacteria permitting more efficient access of rifampicin to its cytosolic target. However, analysis of AG extracted from WT, ⌬glfH1, and complemented ⌬glfH1 strain showed that glfH1 deletion did not have an observable impact on AG structure (Fig. S21), suggesting that modifications are confined to a subclass of AG and cannot be assessed by a global analysis. We thus addressed whether this moderate cell wall defect would translate into an attenuated survival of the mycobacterium in the harsh environment it would encounter inside a phagocytic cell. M. smegmatis, which does not survive infection in macrophages, was shown to survive and proliferate in Acanthamoeba cells (31). We therefore infected Acanthamoeba polyphaga with M. smegmatis WT, ⌬glfH1, and its complemented strain at an m.o.i. of 10 and monitored the growth of the bacteria 2 h post infection (hpi), 24 hpi, and 48 hpi by first

GlfH1 hydrolyzes mycobacterial galactan
lysing the amoebal cells and then plating the lysates on 7H10-OADC to monitor the colony-forming unit(s) per milliliter. Massive extracellular growth of the bacteria, which could not be controlled by washing the amoebal monolayers or by addition of antibiotics, impeded extension of the infection period to time points beyond 48 hpi. As clearly shown in Fig. 6D, ⌬glfH1 suffered of a significant growth impediment in amoebae with a bacillary load of approximately half a log less than that of the WT strain at 48 hpi. Genetic complementation of ⌬glfH1, on the other hand, led to a partial restoration of the WT phenotype, supporting a role of GlfH1 in intra-amoebal growth.

Discussion
Significant attention has been paid to glycosidases at the therapeutic level in the past two decades: several glycosidase inhibitors have been discovered for potential use in the treatment of cancer, viral infections, neurodegenerative diseases, autoimmune diseases, and diabetes (32). In contrast, very little interest has been given to mycobacterial glycosidases, despite the fact that they may play a crucial role in the virulence and persistence of M. tuberculosis. The absence of Galf in mammalian glycoconjugates further strengthens its biosynthetic pathway as an attractive therapeutic target (33).
Herein, we demonstrated the existence of the first endogenous galactosidase that targets mycobacterial AG. The behavior of GlfH1 is surprising in several ways and shows that it is an unusual glycosidase. First, the failure of GlfH1 to hydrolyze pNP-␤-D-galactopyranoside suggests that its enzymatic activity is specific to the furanose form of the monosaccharide. We also demonstrated that GlfH1 is an exo-Galf-ase that processes
In addition, this enzyme shows an absolute requirement for Ca 2ϩ to function, which is a very rare feature among glycosidases as typically only a pair of carboxylate residues (Asp or Glu) is required in the catalytic site. The need of Ca 2ϩ for activity was observed for an endo-␣-L-arabinanase of GH43 in Bacillus subtilis, several exo-mannosidases of GH38, GH47, GH92, as well as an ␣-glucosidase and an ␣-galactosidase of GH97 (35,36). However, there is no observation of such a requirement in the ␤-Galf-ases reported so far. Interestingly, three histidine residues are conserved in several glycosidases from the GH5_13 family of which their structure has been solved (Fig. 1C). They may be implicated in the stabilization of a Ca 2ϩ ion that is required for the hydrolytic activity of the enzyme.
Besides, it should be noted, after the research that led to the discovery of GlfH1, a ␤-D-galactofuranosidase (ALJ47066.1) is now classified in the GH5_13 family (update 2019 -10-16). The gene coding for this protein has been identified in Bacteroides ovatus ATCC 8483 (37), nevertheless its function has only been inferred for the moment. An alignment with the sequence of GlfH1 shows 41.1% identity and strongly suggests a shared ancestry between them.
The analysis of the stereochemical specificity of GlfH1 done by GC/MS suggests clearly that GlfH1 acts by retention of the anomeric configuration. We can also estimate the distance between two catalytic residues to be around 5.5 Å, as the fact that two catalytic residues in retaining glycosidases are 5.5 Å apart, instead of 10.5 Å in inverting glycosidases (38). This result will be very helpful for the 3D structure determination and the identification of its catalytic residues. Hydrolysis of the synthetic substrate that mimics the structure of AG by GlfH1 suggests that this enzyme hydrolyzes the oligosaccharides regardless of the nature of the galactofuranose residue at the nonreducing end. Hence, it would be instructive to investigate if GlfH1 has the same rate of hydrolysis for the ␤-Galf-(1,5) and ␤-Galf-(1,6) residues. Different release efficiencies by GlfH1 were seen between AG, AGP and mAGP, with AGP being the best substrate. Our hypothesis is that AG, because of its unusual branching pattern, has a structure compacted by hydrogen bonds (39), especially when it is no longer bound to PG and mycolic acids. When linked to PG (AGP), fewer hydrogen bonds are formed, and AG becomes more accessible to GlfH1. In contrast, when simultaneously bound to PG and mycolic acids (mAGP), accessibility of AG to GlfH1 is compromised because of the presence of the mycolic acid layer. That the deletion of glfH1 (MSMEG_5877) shows an attenuated growth phenotype inside amoeba strongly suggests the importance of GlfH1 during infection. However, in-depth study of the role of GlfH1 in bacterial growth and infection using M. tuberculosis mouse models remains to be accomplished. Also, the regulation of GlfH1-mediated AG remodeling and possible presence of additional yet unidentified enzymes that may have partially overlapping function to GlfH1 will need to be taken into account in further detailed in vivo studies of AG remodeling and its impact of bacterial physiology and pathogenicity.
The identification of the first glycosidase involved in the degradation of the AG allows us to speculate on the presence of a functional complex of polysaccharide remodeling enzymes and recycling of degradation products. According to the KEGG pathway of galactose metabolism in M. tuberculosis (RRID: SCR_018145 (identifier map00520)) , several enzymes are required for the Leloir pathway to recycle galactose, including galactose mutarotase, galactokinase Rv0620/galK (NP_215134.1), galactose-1-phosphate uridylyltransferase Rv0618/galTa (WP_ 003900189.1) and Rv0619/galTb (CCP43360.1), UDP-galactopyranose mutase Rv3809c/glf (NP_218326.1), and galactosidase GlfH1/Rv3096 (NP_217612.1). For instance, the galactose mutarotase remains unidentified. Therefore, it would be essential to characterize this enzyme, as well as the means used by mycobacteria to transfer galactose back to the cytoplasm, so that the identification of the enzymes implicated in the Leloir pathway will be completed. On the other hand, the recycling of PG in M. tuberculosis and in M. bovis BCG has been reported recently. The lipoprotein LpqI expresses an exo-␤-N-acetylglucosaminidase that can cleave PG fragments in vitro. LpqI cleaves GlcpNAc-Murp-NAc disaccharides, once its stem peptide is removed, allowing the release of free MurpNAc. The MurpNAc is then followed by D-lactyl-ether cleavage to release lactate, which can be used by the cell under aerobic conditions (11). The highlight of these two mycobacterial enzymes, LpqI and GlfH1, allows us to progress in the understanding of the remodeling and the recycling of mycobacterial major cell envelope constituents.
Finally, the importance of GlfH1 in modifying the structure of mycobacterial arabinogalactan may be associated with the growth of mycobacteria during infection, as shown by the infection of A. polyphaga with the M. smegmatis mutants. Interestingly, a previous study reported that GlfH1 is among the 41 proteins overexpressed in a M. tuberculosis mutant strain deficient in the expression of perM, when cultivated at pH 4.5 (40). PerM is an essential protein for the persistence of M. tuberculosis within IFN-␥-activated macrophages following the acidification of phagosomes at pH 4.5 (40). Among the remaining 40 proteins of known functions, most of them are involved in the remodeling of elements constituting the envelope, such as PG or mycolic acids. In this biological context, the optimum activity pH of GlfH1 at 4.5 is completely adapted to the environment of M. tuberculosis during chronic infection of the macrophages. That GlfH1 appears necessary for optimal growth in A. polyphaga is also compatible with this hypothesis when considering that mycobacteria reside also in acidic organelles in a similar way to what happens in macrophages (41). The importance of GlfH1 in cell wall remodeling is further emphasized by its genomic environment, as it is adjacent to Rv3097c, also known as lipY, that encodes a well-characterized PE protein expressing triacylglycerol hydrolase activity. LipY participates to consumption and reprocessing of both mycobacterial GlfH1 hydrolyzes mycobacterial galactan lipids and host lipids, thereby contributing to persistence of the tubercle bacillus inside foamy macrophages (42)(43)(44).
Based on these observations, the detailed expression of GlfH1 should be carried out to uncover the molecular mechanisms of resistance within the phagolysosome. Likewise, future studies will be necessary to investigate the putative contribution of GlfH1 during mycobacterial cell division, virulence, and persistence.

General
For all the preparation of buffers, substrates, reagents, enzyme reactions, ultrapure water with an electrical resistivity Ͼ 18 M⍀⅐cm (ELGA, Veolia Water STI, UK) was used. All chemical substances used were purchased from Sigma-Aldrich or Acros Organics (France) depending on the availability in stock. HisTrap TM HP and HiTrap TM Desalting prepacked columns were purchased from GE Healthcare. Micro BCA TM Protein Assay Kit was purchased from Thermo Fisher Scientific for determination of protein concentration. Mass spectrometry grade trypsin (porcine) was purchased from G-BioSciences (St. Louis, MO).

Bacterial strains, plasmids, and growth conditions
For recombinant protein production, E. coli strains DH5␣ (New England Biolabs) and BL21 (DE3) (Invitrogen) were used for cloning and protein expression. pET32a (Invitrogen) was used as an expression vector. The E. coli strains were grown in Auto-Induction Media (recipe Formedium) or Luria broth (LB) agar with the appropriate concentrations of antibiotics (ampicillin 100 g/ml) at 25°C overnight with agitation.

Plasmid generation
Cloning of GlfH1-The GlfH1 open reading frame (ORF) was PCR amplified with the left primer 5Ј-CACCGTCGAACGGC-CCT)-3Јand right primer 5Ј-ACGAGGAAGCTTAATGCGG-GCTGGGGAAAG-3Ј (underlined HindIII restriction site) and Phusion polymerase (Thermo Fisher Scientific). The product was endonuclease digested with HindIII and ligated to pET32a linearized with EcoRV and HindIII to yield pET32a-Rv3096. From this construct, GlfH1 is expressed with an N-terminal tag containing, from start codon to the GlfH1 ORF, a thioredoxinA ORF, a His 6 tag, a thrombin cleavage site, an S-tag, and an enterokinase cleavage site.
MSMEG_5877 expression in E. coli-Briefly, the MSMEG_ 5877 ORF plus an N-terminal tobacco etch virus cleavage site was PCR amplified using the primer set 5Ј-GAGAATCTGTA-CTTCCAGGGAGTGACAACAGCACCGCGCGC-3Ј and 5Ј-ACGAGGAAGCTTCTACACACGAGCGGTCAGTT-3Ј. The PCR amplicon was restricted with HindIII only and ligated to EcoRV-HindIII-linearized pET32a.
MSMEG_5877 knock-out and complementation-To accomplish targeted deletion of MSMEG_5877, a construct containing an allelic exchange substrate for the replacement of MSMEG_5877 with a hygromycin cassette was first generated. Fragments of DNA flanking MSMEG_5877 on the left and right were first PCR amplified using the primer sets 5Ј-AGCTGAA-CTAGTGTGTACGAAGGGGAGTGGTG-3Ј and 5Ј-ACGA-CTAAGCTTCGGTGCACCACAAGAACATA-3Ј (left arm) and 5Ј-ACGAGATCTAGAATTCCTGGGACAAGCCC-TAC-3Ј and 5Ј-ACGAGAGGTACCGTCAACCCGCTGTCC-AAC-3Ј (right arm). The left arm was first cloned into the SpeI-HindIII restricted pJSC347 (45), followed by the right arm using the XbaI and KpnI restriction sites leading to pJSC347-glfH1. For genetic complementation of the ⌬ glfH1 mutant, a fragment of DNA containing the MSMEG_5877 ORF, its putative 5Ј-UTR and promoter as well as a C-terminal Human influenza hemagglutinin (HA) tag was PCR amplified with the primer set 5Ј-GAGAGATCTAGAAAGACACTGAACCTGTGCG-CGC-3Ј and 5Ј-CGAGCTGCAGTTAAGCGTAATCTGG-AACATCGTATGGGTACACACGAGCGGTCAGTTTGC-3Ј, restricted with XbaI and PstI and cloned into pMV306 (46) to yield, pMV306-MESMEG_5877.

Protein production and purification
The sequence-confirmed recombinant plasmid pET32a-Rv3096 was transformed into E. coli BL21 (DE3) using the heat shock method. Transformants were cultured on LB agar plates containing 100 g/ml ampicillin overnight at 37°C. One single clone was selected and grown overnight in 10 ml of LB preculture with 100 g/ml ampicillin at 37°C with agitation, then enlarged into Auto-Induction Media containing 100 g/ml ampicillin overnight at 25°C with agitation. Cells were harvested by centrifugation for 15 min at 2450 ϫ g, resuspended in lysis buffer (Tris 25 mM, pH 7.5, NaCl 300 mM, imidazole 20 mM) with Halt TM Protease inhibitor EDTA-free 1ϫ (Thermo Scientific) and lysed by French press (SimoAminco) at 1650 psi. The soluble fraction and the pellet were separated by centrifugation for 30 min at 13,000 ϫ g. Immobilized metal ion affinity chromatography was used for purification of GlfH1. The soluble fraction was purified by using ÄKTA star system (GE Healthcare Life Sciences) with a 5 ml HisTrap HP column (GE Healthcare Life Sciences). The three buffers used for chromatography were equilibrate buffer A, wash buffer A ϩ , and elution buffer B. The purification contained a first step of loading the crude protein onto the column, which was realized at 1 ml/min, a second step of incubation by using the wash buffer A ϩ at room temperature for 1 h, followed by a third step to elute the protein with 100% of buffer B at 1 ml/min. Fractions containing GlfH1 were collected and diluted 10 times by buffer A before concentration. The concentration of the protein was performed as described before with a 1 ml HisTrap HP column (GE Healthcare Life Sciences). Finally, the fractions containing GlfH1 were collected and desalted by using ÄKTA star system (GE Health-

GlfH1 hydrolyzes mycobacterial galactan
care Life Sciences) with a 5 ml HiTrap Desalting column (GE Healthcare Life Sciences). The buffer used for gel filtration chromatography and the storage of GlfH1 was buffer C. The GlfH1 protein was analyzed by SDS-PAGE, and its concentration was determined by the BCA method according to the protocol provided by the manufacturer.

Bioinformatics
Sequences for the GH5_13 subfamily CAZy glycosidases from mycobacteria were extracted from their NCBI reference genome. For all the protein sequences used, the presence of a signal peptide was sought with the SignalP 5.0 server (15) and the eventual signal peptide was removed from the sequence. Subsequently, the protein sequences were numbered without the signal peptide. The results obtained from the consultation of databases were respectively obtained with the release 230 (Feb 2019) for the NCBI (RRID:SCR_006472), release 2019_02 (13 Feb 2019) for UniProt (RRID:SCR_002380), 20 February release for RCSB PDB (RRID:SCR_012820) and February 2019 release for CAZy (RRID:SCR_012909).
The Unipro UGENE v1.32 (Feb 2019) bioinformatic software was used for data sequence management, multiple sequence alignments, and phylogenic analysis (47). Multiple alignments were realized using the MUSCLE algorithm (48) with default parameters for best accuracy. Phylogeny analysis was performed using the PHYLIP Neighbor Joining method (49) with the Henikoff/Tillier PMB as distance matrix model, gamma distributed rates across sites (coefficient of variation of substitution rate among sites of 0.50) and a bootstrapping of 100 replicates with majority rule (extended) as the consensus type. The PhyML Maximum Likelihood method (50) was also used for confirmation with the Dayhoff substitution model and a bootstrapping of 100 replicates.

Initial rate kinetics, pH profile, inhibition, and cation requirement
The activity of GlfH1 was determined using a substrate synthetic, p-nitrophenol-␣-galactofuranose (pNPGalf). Enzyme assays were carried out with 0.5 g of enzyme with 1 mM of pNPGalf and 20 mM of different buffer (variable) with a total volume of 100 l at 37°C for 15 min, then 50 l of 0.5 M sodium carbonate was added to stop the reaction. The negative control was a mixture without protein GlfH1. The enzymatic activities were determined in a 96-well plate and the absorbance of released p-nitrophenol was measured at 405 nm. To determine optimal pH for enzyme, 20 mM citrate buffer from pH 3.0 to pH 6.0, and 20 mM malate buffer from pH 6.0 to pH 9.0, were used. Enzyme reactions were performed as described before with 1 mM pNpGalf. The influence of divalent cations was evaluated by adding 20 mM of CaCl 2 , ZnCl 2 , MnCl 2 , MgCl 2 , NiCl 2 , and EDTA to the standard enzyme assay. Essential influence of Ca 2ϩ was determined with a pre-incubation step, 10 mole of EDTA or EGTA was pre-incubated at 37°C with the enzyme for 10 min, then the reaction was completed by adding 50 mM of CaCl 2 to the standard enzyme assay. Residual activity was measured according to the standard protocol.

Mechanism type analysis
The stereochemistry of enzyme GlfH1 was determined using GC. 2 mM of pNPGalf was hydrolyzed by GlfH1 in presence of 20% (v/v) methanol at 37°C during 24 h then lyophilized prior to derivatization by per-trimethylsilylation as described below.
tate from 0 to 800 mM, and at 25°C. After each separation, a wash of the column with 200 mM NaOH over 20 min then 10 min with 100 mM NaOH was performed to eliminate contaminants. Pulsed amperometric detector was used with standard quad wave potential provided by the manufacturer.
Per-trimethylsilylation analysis-The monosaccharides hydrolyzed by GlfH1, after lyophilization, were derivatized by 20 l of pyridine and 20 l of Bis-Silyl-TriFluoro-Acetamide at room temperature for 2 h, dried under nitrogen, and finally taken up in 100 l of heptane. Derivatives were analyzed by GC/MS, a gas chromatograph (Thermo Scientific Trace GC Ultra) with a SolGel-1ms TM Capillary column (0.25 mm 30 m, 0.25 m film, SGE) was interfaced with a MS (TSQ Quantum GC, Thermo Scientific). The temperature program used was 120°C to 240°C with a slope of 2°C/min; hold at 240°C for 10 min.
NMR analysis-The soluble AG of M. tuberculosis was incubated with and without GlfH1 for 48 h using standard conditions. The reaction mixtures were lyophilized, exchanged twice in deuterium oxide, and subjected to high field 1 H and 13 C NMR spectroscopy analyses. Spectra were recorded on 21.4-T spectrometers equipped with 5-mm triple resonance cryoprobe inverse (TCI) cryoprobe with 1 H, 2 H, 13 C cooled channels and a 15 N channel with a z-gradient. (Unité de Glycobiologie Structurale et Fonctionnelle, Infrastruture de Recherche-Très Hauts Champs-Résonance Magnétique Nucléaire, CNRS) where protons resonate at 900 MHz and 13 C resonates at 250 MHz, respectively. All samples were put in 5-mm tubes matched for D 2 O. Acetone was added as an internal standard, starting from a solution of 2.5 l of acetone in 10 ml of D 2 O. All pulse sequences were taken from the Bruker library of pulse programs and then optimized for each sample.

Generation of a MSMEG_5877 deletion mutant (⌬ glfH1)
A deletion mutant of MSMEG_5877 was generated using the mycobacterial recombineering strategy (30). Briefly, an exponential phase culture of M. smegmatis mc 2 155 carrying pJV53 was washed with PBS and used to inoculate to an A 600 ϭ 0.02 7H9 supplemented with 0.2% succinate, 0.05% Tween 80 and 50 g/ml kanamycin. This culture was incubated overnight at 37°C with agitation after which acetamide was added (0.2% w/v final concentration) and the culture was incubated for an additional 3 h at 37°C with shaking. The recombineering-proficient bacteria were then made electrocompetent through several washes with 10% glycerol-0.05% Tween 80 at 4°C and electrotransformed with 100 ng of an allelic exchange substrate, which was prepared by restriction digest of pJSC347-glfH1 with NheI and KpnI and gel-purification of the 3.8 kb DNA band. Following transformation, cells were recovered in 7H9-OADCT for 4 h at 37°C before being spread out on 7H10-OADC plates containing 100 g/ml hygromycin. After 5 days of incubation at 37°C colonies were transferred to liquid media and PCR-genotyped with the primer sets 5Ј-GACGACCCTAGAGTCCTG-TCC-3Ј and 5Ј-CGGGTTCGACGACTTCAC-3Ј (left arm) and 5Ј-GACTGCTCGACACCCTCAC-3Ј and 5Ј-GACACCGCC-CCCGGCGCCTGA-3Ј (right arm), which only produced specific PCR amplicons in the ⌬ glfH1 mutant and not in the WT progenitor. The ⌬glfH1 mutant was subsequently, electrotrans-formed with the complementation construct, pMV306-MS-MEG_5877 (see "Plasmid generation") and the complemented colonies were selected on LB plates supplemented with kanamycin and confirmed by PCR.

Co-culture of M. smegmatis strains and A. polyphaga
A. polyphaga cultures were grown and maintained in PYG (peptone yeast glucose) medium at 28°C. For amoeba infection assays, cells were harvested and washed three times with Page's modified Neff's amoebae saline (PAS), resuspended in fresh PYG media, and adjusted to a final concentration of 5 ϫ 10 5 cells ml Ϫ1 (10 5 cells ml Ϫ1 for microscopy experiments). M. smegmatis cultures were pelleted, dispersed by 10 passages of the bacterial suspension through a 26-gauge needle, and then used to infect A. polyphaga at an m.o.i. of 10:1. Following 6 h of incubation at 32°C, three thorough washes in PAS buffer were performed, followed by 2 h of incubation in the presence of 100 g/ml of amikacin, to kill all extracellular mycobacteria. The number of colony-forming unit(s) per milliliter was determined for each M. smegmatis strain after lysing the A. polyphaga monolayer with 0.5% SDS for 10 min at 32°C, after 2, 24, and 48 h of co-culture.