A heterodimeric glutathione S-transferase that stereospecifically breaks lignin's β(R)-aryl ether bond reveals the diversity of bacterial β-etherases

Lignin is a heterogeneous polymer of aromatic subunits that is a major component of lignocellulosic plant biomass. Understanding how microorganisms deconstruct lignin is important for understanding the global carbon cycle and could aid in developing systems for processing plant biomass into valuable commodities. Sphingomonad bacteria use stereospecific glutathione S-transferases (GSTs) called β-etherases to cleave the β-aryl ether (β-O-4) bond, the most common bond between aromatic subunits in lignin. Previously characterized bacterial β-etherases are homodimers that fall into two distinct GST subclasses: LigE homologues, which cleave the β(R) stereoisomer of the bond, and LigF homologues, which cleave the β(S) stereoisomer. Here, we report on a heterodimeric β-etherase (BaeAB) from the sphingomonad Novosphingobium aromaticivorans that stereospecifically cleaves the β(R)-aryl ether bond of the di-aromatic compound β-(2-methoxyphenoxy)-γ-hydroxypropiovanillone (MPHPV). BaeAB's subunits are phylogenetically distinct from each other and from other β-etherases, although they are evolutionarily related to LigF, despite the fact that BaeAB and LigF cleave different β-aryl ether bond stereoisomers. We identify amino acid residues in BaeAB's BaeA subunit important for substrate binding and catalysis, including an asparagine that is proposed to activate the GSH cofactor. We also show that BaeAB homologues from other sphingomonads can cleave β(R)-MPHPV and that they may be as common in bacteria as LigE homologues. Our results suggest that the ability to cleave the β-aryl ether bond arose independently at least twice in GSTs and that BaeAB homologues may be important for cleaving the β(R)-aryl ether bonds of lignin-derived oligomers in nature.

Lignocellulosic plant biomass is one of the most plentiful organic materials on Earth, so deciphering the mechanisms of its decomposition is an important part of understanding the global carbon cycle (1,2). Lignin, which can make up as much as 30% of lignocellulosic biomass, is a heterogeneous polymer composed mainly of phenylpropanoid subunits linked together via several types of chemical bonds (3)(4)(5). Its abundance and composition make lignin a potential renewable raw material for the production of industrial chemicals, including low-molecular weight aromatic compounds, that are currently produced from petroleum (6,7). However, the heterogeneous nature and chemical features of lignin make it difficult to depolymerize into defined products suitable for chemical upgrading. Indeed, many chemical methods for depolymerizing lignin produce heterogeneous mixtures of reactive radical compounds that can readily repolymerize (8). Thus, identifying and characterizing how microbes depolymerize lignin could both increase our understanding of the biological processes by which plant biomass is naturally recycled and aid in the development of industrial systems for the production of chemicals from this abundant renewable resource.
Biological degradation of the lignin polymer in nature is generally initiated by extracellular enzymes produced by fungi and some bacteria (9,10). Sphingomonad bacteria use intracellular enzymes to mineralize the low-molecular weight aromatic oligomers and monomers produced in that initial stage of lignin degradation (11). Several sphingomonads are known or predicted to be capable of precise cleavage of many of the linkages between phenylpropanoid subunits in lignin (11,12), including the ␤-aryl ether (␤-O-4) bond, which can represent Ͼ50% of the total linkages (3). Among microbial lignin degraders, the enzymes used by sphingomonad bacteria to break the intersubunit bonds are of particular interest from an industrial standpoint, as they break the bonds in defined ways to generate predictable products that can readily be directed toward chemical upgrading. Indeed, in vitro systems using sphingomonad enzymes to break the ␤-aryl ether bonds of isolated lignin and/or model compounds have been reported (13)(14)(15)(16)(17).
Although studies of Sphingobium sp. SYK-6 have provided significant understanding of the sphingomonad pathway for breaking the ␤-aryl ether bond, the details of this process are not identical in all sphingomonads. For example, Novosphingobium aromaticivorans DSM 12444 does not contain a stereoselective LigG homologue, but instead uses the Nu-class NaGST Nu to remove the GSH moiety from both stereoisomers of GS-HPV (26) (Fig. 1). Here, we use mutant strains to show that N. aromaticivorans contains a ␤-etherase in addition to its LigE homologue (NaLigE) that stereospecifically breaks the ␤-aryl ether bond of ␤(R)-MPHPV, and we identify this ␤-etherase as a previously uncharacterized heterodimeric GST (named here BaeAB) encoded by Saro_2873 (baeA) and Saro_2872 (baeB). We identify amino acid residues in BaeAB's BaeA subunit involved in substrate binding and catalysis, including an asparagine, whose substitution with alanine dramatically affects catalysis, suggesting that the asparagine may be involved in activating the GSH cofactor. We also show that BaeAB has a catalytic efficiency similar to that of NaLigE. Notably, we find that the BaeA and BaeB polypeptides are phylogenetically distinct from each other and from the previously characterized ␤-etherases, although they are more phylogenetically similar to LigF homologues than to LigE homologues, even though BaeAB and LigF homologues cleave different stereoisomers of the ␤-aryl ether bond (Fig. 1). We also find that BaeAB homologues from the sphingomonads Novosphingobium sp. PP1Y and Sphingobium xenophagum can stereospecifically cleave ␤(R)-MPHPV and that BaeAB homologues are about as common in bacteria as LigE homologues, suggesting that BaeAB homologues may play an important role in breaking the ␤(R)-aryl ether bonds of lignin-derived oligomers in nature. Our results also suggest that the ability to cleave the ␤-aryl ether bond arose independently in the phylogenetically distinct LigF/BaeAB and FuA/LigE GST clusters, and although every individual ␤-etherase thus far characterized is specific for one of the stereoisomers of the bond, each cluster has members that cleave either the ␤(R) or ␤(S) stereoisomer.
As NaLigE was the only known LigE homologue in N. aromaticivorans (29,30) and as LigE homologues were the only sphingomonad ␤-etherases known to break the ␤(R)-aryl ether bond (22), we expected that strain 12444⌬ligE would be unable to completely metabolize erythro-GGE. However, 12444⌬ligE did completely metabolize the erythro-GGE (along with the vanillate), although the racemic MPHPV that transiently accumulated in the culture medium was metabolized more slowly by this strain than by strain 12444⌬1879 (Fig. 2, A and B). In addition, as the MPHPV disappeared from the 12444⌬ligE culture medium, HPV transiently accumulated in the medium to a maximum concentration roughly equal to the initial erythro-GGE concentration (Fig. 2B), indicating that the GGE was completely converted into HPV (presumably via racemic MPHPV; Fig. 1). These results suggest that strain 12444⌬ligE contains one or more enzymes that are capable of breaking the ␤-aryl ether bond of ␤(R)-MPHPV.

Discovery of a heterodimeric bacterial ␤-etherase
dance when cells are grown in the presence of GGE versus its absence. 3 We separately deleted Saro_2872 and Saro_2873 from the genome of strain 12444⌬ligE and grew the resulting strains (12444⌬ligE⌬2872 and 12444⌬ligE⌬2873, respectively) in the presence of vanillate and erythro-GGE. Both strains consumed the vanillate, but neither strain could fully metabolize the GGE (Fig. 2, D and E), instead accumulating extracellular MPHPV to a concentration roughly one-half of the culture's initial erythro-GGE concentration.
To identify the MPHPV stereoisomer(s) that remained in the media of the 12444⌬ligE⌬2872 and 12444⌬ligE⌬2873 cultures, the medium from each culture was filtered, and aliquots were combined with GSH and recombinant NaLigE or NaLigF1 (encoded by Saro_2091), which are known to stereospecifically cleave the ␤(R) or ␤(S) isomers of MPHPV, respectively (29) (Fig. 1). For each sample, the addition of NaLigE resulted in conversion of most of the MPHPV into GS-HPV and guaiacol (Fig. 2, G and J), and the addition of NaLigF1 resulted in conversion of Ͻ10% of the MPHPV into GS-HPV and guaiacol (Fig.  2, H and K). These results indicate that the MPHPV remaining in the 12444⌬ligE⌬2872 and 12444⌬ligE⌬2873 cultures was predominantly the ␤(R) isomer and thus that strain 12444⌬ligE requires both Saro_2872 and Saro_2873 for complete metabolism of ␤(R)-MPHPV.
To test the role of Saro_2872 in cells containing a functional NaLigE protein, we deleted Saro_2872 from strain 12444⌬1879 to generate strain 12444⌬2872. When 12444⌬2872 was grown in the presence of vanillate and erythro-GGE, it completely metabolized both substrates, although it consumed the transiently accumulated extracellular MPHPV more slowly than 12444⌬1879 (Fig. 2, A and C), similar to the case for 12444⌬ligE ( Fig. 2B). Thus, it appeared that in a strain containing NaLigE, Saro_2872 is not needed for complete metabolism of MPHPV.

The Saro_2873 (BaeA) and Saro_2872 (BaeB) polypeptides form a heterodimeric enzyme that stereospecifically cleaves ␤(R)-MPHPV
As our mutant results suggested that the Saro_2873 and Saro_2872 gene products contribute to MPHPV cleavage in N. aromaticivorans, we sought to test whether they performed this reaction in vitro. Attempts to separately produce the individual Saro_2873 and Saro_2872 polypeptides from recombinant Escherichia coli cultures were unsuccessful, but we were able to separately produce each polypeptide using a cell-free system (Fig. 3A). When the Saro_2873 or Saro_2872 polypeptides were individually incubated with racemic (␤(R) and ␤(S)) MPHPV and GSH, only a trace amount of GS-HPV was observed (representing Ͻ1% conversion of the MPHPV present in the reaction) ( Fig. 4 (A-C) and Fig. S1 (A-C)), even after several days. (Under these same reaction conditions, the ␤-etherases NaLigE and NaLigF1 are active (Fig. 4, G and H).) The apparent inability of the individual Saro_2873 and Saro_2872 polypeptides to cleave MPHPV, combined with our observation that loss of either Saro_2873 or Saro_2872 rendered 12444⌬ligE unable to metabolize ␤(R)-MPHPV (Fig. 2, D and E) and the fact that Saro_2873 and Saro_2872 overlap in the N. aromaticivorans genome (Fig. S2) suggested that the polypeptides may form a heterooligomeric complex. Indeed, when we repeated our enzyme assay using a mixture of the Saro_2873 and Saro_2872 polypeptides, half of the racemic MPHPV substrate was converted into GS-HPV and guaiacol, suggesting that the individual polypeptides formed a catalytically active enzyme complex in the reaction mixture ( Fig. 4D and Fig. S1D).
To determine which stereoisomer(s) of MPHPV remained after this reaction, we added recombinant NaLigE or NaLigF1 to The dehydrogenases NaLigD, NaLigO, and NaLigN are predicted to oxidize GGE based on homology to enzymes in Sphingobium sp. SYK-6 (18) and the fact that their transcript abundances in N. aromaticivorans are increased by the presence of GGE (26). NaLigF1, NaLigF2, NaLigE, and NaGST Nu have been characterized in vitro (26,28,29), and NaGST Nu has been found to perform its reaction in vivo (26). This study reports the discovery and characterization of the stereospecific heterodimeric ␤-etherase BaeAB (in boldface type). The genes that code for the enzymes are shown in parentheses. The ␣, ␤, and ␥ carbons of the phenylpropanoid propyl chain are labeled in ␣(S)␤(R)-GGE. Each chiral molecule is labeled with its stereoconfiguration. Erythro-GGE used in growth experiments consists of the ␣(S)␤(R) and ␣(R)␤(S) stereoisomers; threo-GGE consists of the ␣(R)␤(R) and ␣(S)␤(S) stereoisomers. Inset, vanillate, which was included in our growth experiments along with erythro-GGE (Fig. 2).

Discovery of a heterodimeric bacterial ␤-etherase
aliquots of the reaction mixture. Upon the addition of NaLigE, there were no detectable changes in the amounts of MPHPV, GS-HPV, or guaiacol ( Fig. 4E and Fig. S1E). In contrast, upon the addition of NaLigF1, the MPHPV was completely converted into GS-HPV and guaiacol ( Fig. 4F and Fig. S1F), indicating that the MPHPV that remained in the reaction mixture that contained both the Saro_2873 and Saro_2872 polypeptides was the ␤(S) stereoisomer, consistent with our hypothesis that the Saro_2873-Saro_2872 enzyme complex cleaves ␤(R)-MPHPV.
To further test this hypothesis, we cloned Saro_2873 and Saro_2872 into a single expression vector and observed that significant amounts of soluble recombinant protein were produced in E. coli harboring this plasmid (Fig. 3B, lane B2). This suggested that our difficulty in expressing each of the polypeptides individually in E. coli may have been because they could not fold correctly in the absence of the other polypeptide and thus were subject to proteolytic cleavage. In addition, despite the fact that a His 8 tag was attached only to the Saro_2872 polypeptide (with an intervening peptide containing a tobacco etch virus (TEV) protease recognition site), the protein purified from E. coli lysate using Ni 2ϩ -chelate chromatography was composed of two polypeptides, with apparent molecular weights as assessed by SDS-PAGE similar to the expected molecular weights of the individual His 8 -Saro_2872 and Saro_2873 polypeptides (ϳ30 kDa) (Fig. 3B, lane B3). When the His 8 tag was cleaved from the protein using TEV protease, neither of the polypeptides bound to Ni 2ϩ -chelate resin, and one of the polypeptides decreased in molecular weight, as assessed by SDS-PAGE (Fig. 3B, lane B4). These results supported the idea that the polypeptides bound to each other to form an enzyme complex. We also analyzed the purified recombinant protein complex by gel permeation chromatography (GPC) and found that it eluted as a single peak with an apparent molecular weight of ϳ59 kDa (Fig. 5). This suggested that the complex was a heterodimer, consistent with the fact that previously characterized GSTs generally function as (homoor hetero-) dimers (31,32).
We incubated the purified recombinant heterodimer with enzymatically generated enantiopure samples of ␤(S)or ␤(R)-MPHPV and GSH ( Fig. 4 (G and H) and Fig. S3 (A-C)). Consistent with our results with the cell-free generated polypeptides, the recombinant enzyme had no effect on ␤(S)-MPHPV and converted ␤(R)-MPHPV into GS-HPV and guaiacol ( Fig. 4 (I and J) and Fig. S3 (D and E)). Thus, we concluded that the

Amino acid substitutions in the BaeA subunit affect catalysis by the BaeAB heterodimer, but an analogous substitution in the BaeB subunit has no effect on catalysis
We sought to gain insight into the relative contributions of the BaeA and BaeB subunits to the catalytic activity of BaeAB by modifying the predicted active-site residues in the individual subunits. Each of the variant BaeAB proteins described below were expressed in and purified from E. coli, and SDS-PAGE analysis indicated that each one contained both a BaeA and BaeB subunit. In addition, each variant enzyme eluted as a single peak in GPC analysis, with an apparent molecular weight that was indistinguishable from that of the WT enzyme (Fig. 5), suggesting that any impact of the amino acid substitutions on enzyme activity is not due to large-scale defects in folding or changes in the oligomeric state of the enzyme.
As attempts to determine the crystal structure of BaeAB have thus far been unsuccessful, we ran a search of the SWISS-MODEL template library of structurally characterized proteins (33) and found that BaeA and BaeB are each most similar in amino acid sequence to LigF from Sphingobium sp. SYK-6 (SLG_08650; WP_014075191.1; Protein Data Bank code 4XT0) (even though LigF and BaeAB cleave different stereoisomers of the ␤-aryl ether bond). We aligned the amino acid sequences of BaeA and BaeB with that of LigF to formulate predictions for which of BaeAB's residues may be involved in catalysis (Fig.   6A). LigF contains an active-site serine residue (Ser 14 ) previously found to be important for cleaving the MPHPV analogue ␤(S)-(1Ј-formyl-3Ј-methoxyphenoxy)-␥-hydroxypropioveratrone (25). This serine residue is conserved in all LigF homologues previously shown to have ␤-etherase activity and in both BaeA (Ser 15 ) and BaeB (Ser 14 ) ( Fig. 6A and Fig. S4). To test the roles of these serine residues in BaeAB, we substituted them with alanines individually (A:S15A and B:S14A) and together (A:S15A/B:S14A) and investigated the in vitro kinetics of cleaving the ␤-aryl ether bond of ␤(R)-MPHPV by WT BaeAB and the variant enzymes (Fig. S5). We found that the variant in which BaeA Ser 15 was changed to an alanine (A:S15A) had a k cat value similar to that of WT BaeAB and a K m value that was 5-6-fold higher than that of WT BaeAB (Table 1), suggesting that the side-chain hydroxyl of BaeA Ser 15 is involved in substrate binding, but not in catalytic turnover of BaeAB. The variant in which BaeB Ser 14 was changed to an alanine (B:S14A) had the same k cat and K m values as WT BaeAB ( Table 1), suggesting that BaeB Ser 14 is not involved in catalytic turnover or substrate binding by BaeAB. The variant in which both BaeA Ser 15 and BaeB Ser 14 were changed to alanines (A:S15A/B:S14A) behaved the same as the A:S15A variant (Table 1), supporting the importance of BaeA Ser 15 in substrate binding and the noninvolvement of BaeB Ser 14 in catalysis by BaeAB.
The three-dimensional structure of LigF shows that the sidechain amide nitrogen of its Asn 13 is within hydrogen-bonding distance (3.3 Å) of the bound GSH thiol (Fig. 6B), suggesting that the asparagine may be involved in activating the GSH thiol Denatured proteins were separated on a NuPAGE 12% BisTris gel using NuPAGE MOPS SDS running buffer (Invitrogen). A, cell-free generated polypeptides. B, recombinant proteins. Lane A1, cell-free reaction mixture to produce the Saro_2873 polypeptide (BaeA; indicated by an arrow). Lane A2, cell-free reaction mixture to produce the Saro_2872 polypeptide containing an N-terminal His 6 tag and intervening peptide linker (BaeB; indicated by an arrow). Lanes A3 and B1, ladder, labeled with molecular weight in kDa (Novex Sharp pre-stained protein standard; Invitrogen). Lane B2, crude E. coli lysate containing recombinant BaeAB, with a His 8 tag and intervening peptide linker on the BaeB subunit N terminus. Lane B3, elution from a Ni 2ϩ -NTA column using imidazole after passing the crude E. coli lysate from lane B2 through the column. Lane B4, flow-through from a Ni 2ϩ -NTA column after cleaving the His 8 tag off the protein in lane B3. Lane B5, elution from a Ni 2ϩ -NTA column using imidazole after passing crude E. coli lysate containing recombinant PP1Y-BaeAB with a His 8 tag and intervening peptide linker on the PP1Y-BaeA N terminus through the column. Lane B6, flow-through from a Ni 2ϩ -NTA column after cleaving the His 8 tag off the protein in lane B5. Lane B7, elution from a Ni 2ϩ -NTA column using imidazole after passing crude E. coli lysate containing recombinant SxeBaeAB with a His 8 tag and intervening peptide linker on the SxeBaeA N terminus through the column. Lane B8, flow-through from a Ni 2ϩ -NTA column after cleaving the His 8 tag off the protein in lane B7. The denatured polypeptides did not all run at their expected molecular weights, possibly due to differences in pI values. C, predicted molecular weights (MW) (kDa) and isoelectric points (pI) of polypeptides analyzed via SDS-PAGE, calculated using ExPASy (52). TEV protease recognition sites are underlined in the sequences.

Discovery of a heterodimeric bacterial ␤-etherase
(GST reactivity typically involves activation of a GSH molecule by stabilizing it in its thiolate form through hydrogen-bonding with an enzyme active-site residue (34)). We found that an analogous asparagine is present in all of the LigF homologues previously shown to have ␤-etherase activity and in BaeA (Asn 14 ), although BaeB has an alanine in this position (Fig. 6A). Activesite asparagine residues are known (35,36) or predicted (26) to be involved in catalysis by some other GSTs. When we replaced Asn 14 in BaeA with an alanine, we found that the resulting variant (A:N14A) had a k cat value ϳ20-fold lower and a K m value ϳ12.5-fold higher than WT BaeAB ( Table 1), suggesting that this asparagine side chain is critical for both substrate binding and turnover in BaeAB. This large negative effect of substituting a single amino acid residue in BaeA on catalysis by BaeAB suggests that BaeA plays a major role in the heterodimer's reactivity. Indeed, if the BaeA subunit were completely inactivated in the A:N14A variant, a comparison of k cat /K m values for WT BaeAB and the A:N14A variant suggests that BaeB has a catalytic efficiency with ␤(R)-MPHPV as substrate that is Ͻ0.3% of BaeA's catalytic efficiency. In addition, if the GSH binding pocket in BaeB is structurally similar to that of LigF, our amino acid sequence alignment predicts that there may not be any polar residues in BaeB other than Ser 14 in a position to activate the thiol of a GSH molecule bound in BaeB (Fig. 6A), which would be consistent with BaeB having a lower catalytic activity than BaeA.  75 and 150 kDa). The data for the proteins of known molecular weight were fit to a cubic curve, which was used to calculate the apparent molecular weights of WT BaeAB and the BaeAB variants. The individual BaeA and BaeB polypeptides are predicted to be ϳ29 kDa (Fig. 3).

Comparing BaeAB and NaLigE catalysis
We sought to compare the catalytic abilities of BaeAB and NaLigE, the two N. aromaticivorans enzymes now known to cleave the ␤(R)-aryl ether bond. For this, we tested purified recombinant NaLigE for cleavage of ␤(R)-MPHPV in vitro under conditions identical to those used to test BaeAB (Fig.  S5) and found that NaLigE had a ϳ4-fold lower k cat value and a ϳ3-fold lower apparent K m value than BaeAB, leading to a measured catalytic efficiency (k cat /K m ) for NaLigE that is slightly lower than that of BaeAB (Table 1). Thus, the newly discovered heterodimeric ␤-etherase BaeAB is catalytically comparable with, or even more efficient than, NaLigE at cleaving ␤(R)-MPHPV.

The stereospecificity of BaeAB is unexpected from its phylogeny
The five bacterial ␤-etherases that have previously been shown to break the ␤(R)-aryl ether bond (LigE homologues; Fig.  S6) share Ͼ59% amino acid sequence identity with each other, and the five bacterial ␤-etherases that have previously been shown to break the ␤(S)-aryl ether bond (LigF homologues; Fig.  6A and Fig. S6) share Ͼ39% amino acid sequence identity with each other (four of these LigF homologues share Ͼ60% sequence identity between them, but NaLigF2 is only ϳ40% identical to the others). In contrast, the LigE and LigF homologues share Ͻ22% amino acid sequence identity between them (Fig. S6). An amino acid sequence analysis of all of the GSTs involved in the sphingomonad pathway for breaking the ␤-aryl ether bond shows that the LigF and LigE homologues form separate phylogenetic clades (Fig. 7), consistent with the low sequence identity between them. In addition, the LigF homologues are split into two related, but distinct, subclades, which we refer to as LigF1 and LigF2 homologues (Fig. 7).
BaeA and BaeB share ϳ24% amino acid sequence identity with each other (Fig. 6A and Fig. S6) and fall into separate phylogenetic clades (Fig. 7). They are also both distinct in amino acid sequence from other ␤-etherases (Fig. S6): BaeA is Ͻ36% and Ͻ17% identical to the LigF and LigE homologues, respectively, and BaeB is Ͻ30% and Ͻ18% identical to the LigF and LigE homologues, respectively. However, BaeA and BaeB are both phylogenetically closer to the LigF homologues than to the LigE homologues (Fig. 7), despite the fact that BaeAB cleaves  (25) is highlighted in green. The BaeA and BaeB residues substituted by alanine in this study are highlighted in blue. B, active site of LigF from Sphingobium sp. SYK-6 (Protein Data Bank code 4XT0). Carbon atoms for the LigF N-terminal thioredoxin domain and for the bound GSH are colored green and magenta, respectively. Oxygen, nitrogen, and sulfur atoms are colored red, blue, and yellow, respectively. The LigF C-terminal GST domain is colored gray. Distances in Angstroms are shown for the GSH thiol to the side-chain hydroxyl of Ser 14 and to the side-chain amide nitrogen of Asn 13 .

Table 1 Kinetic parameters for the enzymatic conversion of ␤(R)-MPHPV into GS-HPV
Parameters are from non-linear least-squares best fits of the Michaelis-Menten equation to plots of initial rate ([GS-HPV] formed/s) versus ␤(R)-MPHPV concentration (Fig. S5). For the BaeAB enzymes, the concentrations of the heterodimers were used in the equation; for NaLigE, the concentration of the individual polypeptide subunits of the homodimer was used.

Distributions of BaeAB and LigE homologues among bacteria
Based on a BLASTp search of the NCBI database (performed in October 2018), the genomes of 43 species were predicted to contain genes for homologues of both BaeA and BaeB (using an amino acid sequence identity Ͼ50% and a bit score Ͼ200 as criteria), and 48 species were predicted to contain at least one gene for a LigE homologue (six of these contain genes for two LigE homologues) (Table S1). Of these bacteria, 22 (including N. aromaticivorans, Novosphingobium sp. MBES04, and Novosphingobium sp. SG61-1L, which are known to metabolize erythro-GGE (19,26,30)) were predicted to encode both a BaeAB and a LigE homologue. These results suggest that BaeAB homologues may be about as common in bacteria as LigE homologues.
In all of the bacteria predicted to contain a BaeAB homologue, the BaeA structural gene is directly upstream of the gene for BaeB, and in all but one case, the genes overlap (Table S1), an arrangement that is often seen for polypeptides that form a heterooligomeric complex, further supporting the prediction that these species contain active BaeAB heterodimers. Interestingly, there is greater amino acid sequence identity among the BaeA homologues than among the BaeB homologues (Fig. S7), which is consistent with our proposal that the BaeA subunit may contribute more to catalysis by BaeAB than BaeB.

BaeAB homologues from other sphingomonads stereospecifically cleave ␤(R)-MPHPV
Among the bacteria predicted to contain a homologue of BaeAB are Novosphingobium sp. PP1Y and S. xenophagum. The putative PP1Y-BaeA and PP1Y-BaeB polypeptides from Novosphingobium sp. PP1Y are 74 and 65% identical in amino acid sequence, respectively, to BaeA and BaeB from N. aromaticivorans (Table S1 and Fig. S7). The putative SxeBaeA and Sxe-BaeB polypeptides from S. xenophagum are 70 and 52% identical in amino acid sequence, respectively, to BaeA and BaeB (Table S1 and Fig. S7). To test whether BaeAB homologues from these species have the same activity as BaeAB from N. aromaticivorans, we cloned the predicted baeA and baeB genes from Novosphingobium sp. PP1Y and S. xenophagum into plas- For proteins that have been tested for activity, nodes are marked with a circle color-coded for the observed catalytic activities (shown in the key). Proteins shown to catalyze a reaction include the LigF, BaeA, and BaeB homologues analyzed in Fig. 6, as well as NaLigE (Saro_2405; WP_011446047.1) and NaGST Nu (Saro_2595; WP_011446237.1) from N. aromaticivorans (26,28,29) (26); and Ds-GST1 (XP_007363869.1) from D. squalens (39). Tested proteins for which no measurable activity was observed are LigP-SC (WP_012238501.1) from S. cellulosum (28) and RpHypGST (RPA4340; WP_027276361.1) from R. palustris (29). Unmarked nodes represent proteins that have not been tested for activity (for a full list, see Table S3). Selected bootstrap values are displayed. Class names are shown in parentheses for clusters previously proposed to be members of established GST classes (see "Discussion"). A Newick file for the phylogenetic tree is included in the supporting information.

Discovery of a heterodimeric bacterial ␤-etherase
mids and expressed the recombinant proteins in E. coli. For both PP1Y-BaeAB and SxeBaeAB, two polypeptides were purified using Ni 2ϩ -chelate chromatography from the E. coli lysates, despite the fact that only the BaeA polypeptides contained a His 8 tag (Fig. 3B, lanes 5-8), indicating that the proteins were heterodimers, like BaeAB from N. aromaticivorans. When we separately mixed PP1Y-BaeAB and Sxe-BaeAB with enantiopure samples of ␤(S)or ␤(R)-MPHPV and GSH, neither enzyme cleaved the ␤(S)-MPHPV, but both enzymes converted ␤(R)-MPHPV into GS-HPV and guaiacol (Fig. 4 (K-N) and Fig. S8), precisely as we found with BaeAB from N. aromaticivorans.

Discussion
Characterizing microbial strategies for lignin breakdown is important for understanding plant biomass turnover in nature and could aid in developing industrial systems for producing commodity chemicals from this abundant renewable resource.
Here, we provide new information on the pathway used by sphingomonad bacteria to cleave the ␤-aryl ether bond commonly found in lignin. Specifically, we report on a previously uncharacterized heterodimeric ␤-etherase from N. aromaticivorans and other sphingomonads (BaeAB) with unique properties.

BaeAB is a unique bacterial ␤-etherase
BaeAB is different from previously characterized bacterial ␤-etherases in several respects. Most notably, although BaeA and BaeB are phylogenetically distinct from the previously characterized ␤-etherases, they are more similar to the LigF homologues (which cleave the ␤(S)-aryl ether bond) than to the LigE homologues (which cleave the ␤(R)-aryl ether bond) (Fig. 7), even though BaeAB cleaves the ␤(R) isomer of the bond. This shows that the amino acid sequences of a ␤-etherase's polypeptides cannot alone be used to predict its stereospecificity.
In addition, BaeAB homologues are heterodimers, in contrast to the LigF and LigE homologues, which function as homodimers (25). Our conclusion that BaeAB is a heterodimer is supported by three lines of evidence. First, the individual BaeA and BaeB polypeptides showed little if any catalytic activity (Fig. 4, B and C), but activity was observed when they were combined or expressed in the same cell (Fig. 4, D and J). Second, both BaeA and BaeB were purified using Ni 2ϩ -chelate chromatography, despite the fact that only one polypeptide contained a His 8 tag (Fig. 3B, lane B3). Finally, GPC analysis indicated that the purified BaeAB protein complex has an apparent molecular weight of ϳ59 kDa (Fig. 5), whereas BaeA and BaeB are each predicted to have molecular weights of ϳ29 kDa (Fig. 3).
The fact that substitution of a single amino acid residue (Asn 14 ) in BaeA had a large negative effect on catalysis by BaeAB (Table 1), and our observation that amino acid sequence conservation appears to be more critical for BaeA than for BaeB (Fig. S7) suggests that the BaeA subunit makes a larger contribution to catalysis in BaeAB than in BaeB. One possible explanation for our results is that the main role of the BaeB subunit may be to form a dimer interface with BaeA that allows BaeA to correctly orient the ␤(R)-aryl ether substrate in its active site. Indeed, based on their three-dimensional structures, it was pro-posed that the homodimeric interfaces of LigE and LigF are important for determining their substrate stereospecificities (25). Another possible explanation for our result with the A:N14A variant enzyme is that the BaeB subunit has GST activity with some substrate(s) other than MPHPV.

Diversity of ␤-etherases and implications for the evolution of the ability to break the ␤-aryl ether bond
Previous analyses of LigE and LigF homologues suggested that they are phylogenetically distinct from each other (29,37). LigE from SYK-6 is ϳ23% identical in amino acid sequence to members of the fungal FuA (formerly GTE, or etherase-like) GST class (23,24), and its three-dimensional structure is most similar to those of members of this class (25). In contrast, structural analysis suggested that SYK-6 LigF may belong to a previously unrecognized class of GSTs (25). In a GST classification system based on a similarity network analysis, LigE is a member of the Main.28 subgroup, and LigF is not a member of any established Main subgroup (38). Our analysis shows that LigF homologues are phylogenetically more similar to GST Nu homologues (members of the Nu class of GSTs (26)) and LigG homologues (which share some features of the Omega class (27)) than to the LigE homologues (Fig. 7). In addition, our analysis suggests that there are two related, yet distinct, LigF homologue subclades, which we refer to as the LigF1 and LigF2 homologues (Fig. 7).
It is unclear how best to classify BaeA and BaeB. Neither are members of an established Main subgroup in the similarity network GST classification system (38). Despite being phylogenetically related to the LigF homologues, BaeA and BaeB may not warrant formal inclusion in the LigF GST class, because of their low amino acid sequence homologies to other members of this class (Ͻ36% and Ͻ30% identical, respectively; Fig. S6) (an amino acid sequence identity of ϳ40% is often used as a cutoff for inclusion in a GST class (37)). The low sequence identity between BaeA and BaeB (ϳ24%; Fig. S6) also argues against them being members of the same GST class as each other, although, paradoxically, the fact that they hybridize together argues for classifying them as members of the same class, as GST heterodimers are typically composed of members of the same class, and the ability to hybridize is another criterion used in GST classification (31).
Because the LigE and LigF homologues are so different in structure and amino acid sequence, it has been proposed that they evolved independently to specifically cleave either the ␤(R) or ␤(S) stereoisomer of the ␤-aryl ether bond, respectively (25). It was recently reported that the FuA GST, Ds-GST1, from the white-rot fungus Dichomitus squalens specifically cleaves the ␤(S)-aryl ether bond (39), although it clusters closer to the LigE homologues than to the LigF homologues in our phylogenetic tree (Fig. 7). Together with the unexpected stereospecificity of BaeAB, these results suggest that the ability to break the ␤-aryl ether bond did arise separately in the LigF/BaeAB and FuA/ LigE GST clusters but that, even though every individual ␤-etherase characterized thus far has been stereospecific, neither cluster evolved to specifically cleave only one of the stereoisomers of the bond. In addition, not all members of these two clusters appear to be capable of breaking the ␤-aryl ether Discovery of a heterodimeric bacterial ␤-etherase bond, as demonstrated by the inability of LigE-like GSTs from Sorangium cellulosum (28) and Rhodopseudomonas palustris (29) (which share ϳ39 and ϳ37% sequence identity with NaLigE, respectively) to cleave substrates containing either stereoisomer of the bond (Fig. 7).
Our finding that BaeA Asn 14 is involved in catalysis by BaeAB suggests that the analogous asparagine present in the LigF homologues (Fig. 6A) may also be involved in catalysis by those enzymes. The proximity of the side-chain amide nitrogen of Asn 13 to the GSH thiol in the LigF structure (3.3 Å; Fig. 6B), along with the fact that Ser 14 is the only other residue with a polar side chain near the thiol in the LigF structure (Fig. 6B), suggest that the conserved asparagine may be involved in activating the GSH cofactor (which is typically stabilized in its thiolate form via hydrogen-bonding with a residue side chain in characterized GSTs (34)). If the conserved asparagine were activating the GSH in these enzymes, they would be unique among characterized GSTs, which typically use a residue with a side-chain hydroxyl (such as tyrosine) to activate the GSH (31). It would also suggest different reaction mechanisms between the LigF/BaeAB and FuA/LigE ␤-etherases, as LigE does not contain an active-site asparagine (25) and Ds-GST1 is not predicted to contain one (39). Interestingly, although NaLigE and BaeAB apparently evolved the ability to cleave the ␤(R)-aryl ether bond separately, and presumably utilize different reaction mechanisms, the catalytic efficiencies of these two enzymes are comparable ( Table 1).

The relative roles of BaeAB and LigE in breaking the ␤(R)-aryl ether bond
The fact that N. aromaticivorans strains deficient in either LigE (12444⌬ligE) or BaeAB (12444⌬2872) metabolized MPHPV more slowly than their parent strain (12444⌬1879) (Fig. 2, A-C) shows that both enzymes contribute to metabolism of ␤(R)-MPHPV in WT N. aromaticivorans, although neither individual enzyme is essential. In addition, the fact that strains lacking both BaeAB and NaLigE (12444⌬ligE⌬2872 and 12444⌬ligE⌬2873) were unable to fully metabolize MPHPV (Fig. 2, D and E) suggests that these are the only enzymes that N. aromaticivorans uses to cleave ␤(R)-MPHPV. These results suggest that individual organisms may only require either a LigE or BaeAB homologue to metabolize compounds containing the ␤(R)-aryl ether bond. Indeed, the genome of Sphingobium sp. SYK-6, which encodes two active LigE homologues (LigE and LigP) (11), does not contain a putative BaeAB homologue (Table S1). Conversely, Sphingobium sp. YG1, which is reported to metabolize GGE (40), contains genes for a putative BaeAB homologue but is not predicted to encode a LigE homologue (Table S1). The fact that an N. aromaticivorans strain containing both BaeAB and NaLigE metabolized MPHPV faster than mutants lacking one of these enzymes (Fig. 2, A-C) also suggests that the reason that some organisms contain both a BaeAB and a LigE homologue (or two LigE homologues) (Table S1) may be that it increases the intracellular rate of ␤-aryl ether bond breaking. Otherwise, although BaeAB and NaLigE reacted similarly with ␤(R)-MPHPV (Table 1), the enzymes may react differently from each other with other olig-omers containing the ␤(R)-aryl ether bond, as would be derived from real lignin.

Conclusions
Prior to this work, LigE homologues were the only known bacterial enzymes capable of specifically breaking the ␤(R)-aryl ether bond in a defined way. We showed that BaeAB, which is distinct from previously characterized ␤-etherases in several ways, can perform this reaction at least as well as LigE homologues and that BaeAB homologues may be as widespread among bacteria as LigE homologues. These results suggest that the previously unrecognized BaeAB homologues may play an important role in degradation of lignin-derived aromatic oligomers by sphingomonad bacteria in nature. In sum, our work shows that there is more variability among individual sphingomonads in their pathway for breaking the ␤-aryl ether bond than previously thought and illustrates the importance of studying this pathway, and probably other pathways involved in metabolizing lignin-derived compounds, in multiple species in order to better understand this diversity.

Bacterial growth media
E. coli strains used for creating plasmids were grown in lysogeny broth, and shaken at ϳ200 rpm at 37°C. For routine storage and manipulation, N. aromaticivorans cultures were grown in lysogeny broth or GluSis at 30°C. GluSis is a modification of Sistrom's minimal medium (43) in which the succinate has been replaced by 22.6 mM glucose. N. aromaticivorans growth experiments used standard mineral base (SMB) minimal medium (26), at an initial pH of 7.0. Where needed to select for the presence or absence of plasmids, media were supplemented with 100 g/ml ampicillin, 50 g/ml kanamycin, 20 g/ml chloramphenicol, or 10% sucrose (w/v).

N. aromaticivorans growth experiments
Starter cultures of N. aromaticivorans were grown in 4 ml of SMB containing 4 mM vanillate. Experimental cultures were grown in 20 -30 ml of SMB containing 3 mM vanillate and 1 mM erythro-GGE, in 125-ml conical growth flasks shaken at 200 rpm at 30°C. Aliquots (400 -600 l) were removed at various Discovery of a heterodimeric bacterial ␤-etherase time points and passed through 0.22-m filters (e.g. Whatman Puradisc filters; GE Healthcare) before HPLC analysis of extracellular aromatics. Every culture was grown at least three times; data shown are from representative cultures.
For the 12444⌬ligE⌬2872 and 12444⌬ligE⌬2873 cultures, we filtered Ͼ2 ml for the final time points. To determine which stereoisomer(s) of MPHPV were present in these samples, they were split into three 400-l aliquots and combined with 5 mM GSH and either H 2 O, recombinant NaLigE (90 g/ml), or recombinant NaLigF1 (147 g/ml) and incubated at 30°C for 1 h. These samples were analyzed via HPLC to assay for MPHPV cleavage into GS-HPV and guaiacol as described below.
Expression plasmids were transformed into E. coli B834 (44,45) containing plasmid pRARE2 (Novagen). Recombinant proteins were expressed by growing the E. coli strains for ϳ25 h at 25°C in ZYM-5052 autoinduction medium (46) containing kanamycin and chloramphenicol. Recombinant proteins were purified by passing crude E. coli lysates through a column packed with Ni 2ϩ -NTA resin (Qiagen or Thermo Scientific) as described previously (26), except using gravity-flow columns instead of an FPLC system. After incubation of purified proteins with His 6 -TEV protease, the polypeptides that originally contained a His 8 tag retained a Ser-Ala-Ile-Ala-Gly-peptide on their N termini, derived from the linker between the protein and the TEV protease recognition site. (For BaeAB with the His tag on BaeB, Met 1 was not coded for, so the SAIAG-peptide was linked to residue Ser 2 .) The cleaved His 8 tags and His 6 -TEV protease were removed via a second round of Ni 2ϩ -NTA resin chromatography. Recombinant NaLigF1 and a version of NaLigE that had originally contained a His tag on its C terminus were purified as described previously (29). Recombinant protein concentrationsweredeterminedviatheBradfordmethod(absor-bance at 595 nm), using BSA standards of known concentrations (Pierce, Thermo Scientific) and protein assay dye reagent from Bio-Rad.

Cell-free protein synthesis
We individually cloned Saro_2872 and Saro_2873 into plasmid pEU-NGFP (47), removing the GFP gene in the process (for details, see supporting information). Cell-free protein synthesis was performed essentially as described previously (48). The Saro_2872 polypeptide (BaeB) contained a His 6 tag with an intervening TEV protease recognition site on its N terminus that was not removed. The Saro_2873 polypeptide (BaeA) was synthesized in its native form. Synthesized polypeptides were assayed for enzyme activity using aliquots directly from the cell-free synthesis reaction. Concentrations of BaeA and BaeB in the synthesis reaction mixtures were approximated using the intensities of the polypeptides in an SDS-polyacrylamide gel (49).

Gel permeation chromatography
A Wyatt (Santa Barbara, CA) size-exclusion chromatography analytical column for membrane proteins (5 m, 300 Å, 7.8 ϫ 300 mm) was attached to a Prominence HPLC (Shimadzu; Kyoto, Japan) equipped with an LC-20AD pump, a CTO-20A column oven (35°C), and a SPD-M20A diode array detector (monitoring 250 -500 nm). The buffer was 5 mM HEPES (pH 7.5), 100 mM NaCl, and 0.5 mM tris(2-carboxyethyl)phosphine, run at a flow rate of 1 ml/min. Protein retention times were determined using absorbance at 280 nm. Proteins of known molecular weights used to generate a standard curve were purchased from Sigma-Aldrich (bovine heart cytochrome c, Glycine max trypsin inhibitor, bovine erythrocyte carbonic anhydrase, and BSA) or GE Healthcare (RNase A, ovalbumin, and conalbumin; contained in a gel filtration calibration kit), except for NaGST Nu , which was purified as described previously (26).

Enzyme activity assays
For assays with cell-free synthesized BaeA and BaeB polypeptides, 0.1 mM racemic (␤(S) and ␤(R)) MPHPV was combined with 5.8 mM GSH in enzyme reaction buffer (ERB; 25 mM Tris-HCl (pH 8.0) and 25 mM NaCl). Cell-free protein synthesis mixtures containing BaeA or BaeB were added individually or together to the MPHPV/GSH solutions to achieve concentrations of ϳ24 nM of each polypeptide. These 625-l reactions were incubated at 30°C for 24 h to several days. Each was then split into 190-l aliquots and combined with an additional 2.3 mM GSH and either H 2 O, 151 g/ml NaLigE, or 184 g/ml NaLigF1. These 212-l reactions were incubated at 30°C for several hours and then analyzed via HPLC.

Kinetics of the enzymatic cleavage of ␤(R)-MPHPV
Various concentrations of racemic (␤(S) and ␤(R)) MPHPV were combined with 5 mM GSH in ERB. At time 0, 100 l of an indicated enzyme in ERB ϩ 5 mM GSH was combined with 1 ml of the racemic MPHPV/GSH sample (both pre-equilibrated to 25°C) at 25°C. The final concentration of ␤(R)-MPHPV in each reaction was 0.0045, 0.010, 0.017, 0.068, or 0.13 mM. Final enzyme concentrations were 18 nM BaeAB, 23 nM BaeAB (B:S14A), 22 nM BaeAB (A:S15A), 24 nM BaeAB (A:S15A/B: S14A), 98 nM BaeAB (A:N14A), and 70 nM NaLigE (all BaeAB concentrations are for the heterodimers; the NaLigE concentration is for the individual polypeptides that make up the functional homodimers). At different time points, 200 l of a reaction was removed and combined with 40 l of 1 M HCl (Acros Organics, Geel, Belgium) to terminate the enzyme reaction before HPLC analysis to quantify GS-HPV formed. Control reactions were allowed to proceed for several hours to ensure that only the ␤(R)-MPHPV in the reaction mixtures was cleaved.

HPLC analysis
Analysis and quantification of aromatic compounds were performed using an Ultra AQ C18 (5-m; 250 ϫ 4.6-mm) column (Restek, Bellefonte, PA) attached to a System Gold HPLC (Beckman Coulter, Brea, CA) with buffers described in Fig. S9. In this system, the following pairs of stereoisomers ran as single peaks: ␣(S)␤(R) and ␣(R)␤(S) erythro-GGE, ␣(R)␤(R) and ␣(S)␤(S) threo-GGE, ␤(R) and ␤(S) MPHPV, and ␤(R) and ␤(S) GS-HPV. The eluent was analyzed for light absorbance between 191 and 600 nm, and absorbances at 280 nm were used for quantification of aromatic metabolites by comparing peak areas with those of standards.

Phylogenetic analysis of GSTs involved in breaking the ␤-aryl ether bond
Proteins used in the analysis are listed in Table S3. Proteins were chosen by performing BLASTp searches of the NCBI nonredundant protein database using the amino acid sequences of NaLigE, NaLigF1, NaLigF2, BaeA, and BaeB as queries. Every third hit up to 25 proteins total from each search was combined into a single list, and any duplicate hits from the different searches were removed. To this list were added all of the GSTs that have been tested for the ability to perform one of the reactions involved in the sphingomonad pathway for breaking the ␤-aryl ether bond, if they were not already part of the list (see legend to Fig. 7). Protein sequences were aligned using MAFFT in MegAlign Pro, which is part of the Lasergene version 14.0 suite (DNASTAR, Madison, WI). A phylogenetic tree was calculated via the maximum likelihood method in RAxML version 8.2.3 (50), using 100 rapid bootstrap inferences. The tree was visualized using Interactive Tree of Life version 3 (51).