Identification of glucosyltransferase genes involved in sinapate metabolism and lignin synthesis in Arabidopsis.

Sinapic acid is a major phenylpropanoid in Brassicaceae providing intermediates in two distinct metabolic pathways leading to sinapoyl esters and lignin synthesis. Glucosyltransferases play key roles in the formation of these intermediates, either through the production of the high energy compound 1-O-sinapoylglucose leading to sinapoylmalate and sinapoylcholine or through the production of sinapyl alcohol-4-O-glucoside, potentially leading to the syringyl units found in lignins. While the importance of these glucosyltransferases has been recognized for more than 20 years, their corresponding genes have not been identified. Combining sequence information in the Arabidopsis genomic data base with biochemical data from screening the activity of recombinant proteins in vitro, we have now identified five gene sequences encoding enzymes that can glucosylate sinapic acid, sinapyl alcohol, and their related phenylpropanoids. The data provide a foundation for future understanding and manipulation of sinapate metabolism and lignin biology in Arabidopsis.

Sinapic acid is a major phenylpropanoid in Brassicaceae providing intermediates in two distinct metabolic pathways leading to sinapoyl esters and lignin synthesis. Glucosyltransferases play key roles in the formation of these intermediates, either through the production of the high energy compound 1-O-sinapoylglucose leading to sinapoylmalate and sinapoylcholine or through the production of sinapyl alcohol-4-O-glucoside, potentially leading to the syringyl units found in lignins. While the importance of these glucosyltransferases has been recognized for more than 20 years, their corresponding genes have not been identified. Combining sequence information in the Arabidopsis genomic data base with biochemical data from screening the activity of recombinant proteins in vitro, we have now identified five gene sequences encoding enzymes that can glucosylate sinapic acid, sinapyl alcohol, and their related phenylpropanoids. The data provide a foundation for future understanding and manipulation of sinapate metabolism and lignin biology in Arabidopsis.
Plasticity, both in terms of development and metabolism, is a key feature of plants, most probably arising through the need of sedentary organisms to respond rapidly to prevailing environmental conditions. Phenylpropanoid metabolism is one example of extreme plasticity in which reactions can lead to a wide diversity of products functioning either in their own right or acting as gateways into different metabolic pathways (1). Lignin polymers are major end products of phenylpropanoid metabolism, providing compressive strength and water resistance to the protein-polysaccharide matrix of plant cell walls and conferring general protection against microbial attack (2,3). Other pathways arising from phenylpropanoids include the synthesis and modification of flavonoids, such as those acting as free radical scavengers, signaling molecules, and anti-microbial compounds (4), and the benzoate pathways leading to compounds such as salicylic acid (5).
Among the many different gene functions contributing to this plasticity are those that correspond to modifying enzymes such as the multigene families encoding P450 hydroxylases (6), methyltransferases (7), and the glycosyltransferases (8). Typically in plants, the group of glycosyltransferases (UDP-glucosyltransferase; UGTs) 1 involved in these modifications are characterized by the presence of a signature motif located toward the C terminus of the polypeptide (9, 10). The reactions most often involve the transfer of glucose from UDP-glucose to the second substrate, leading either to the formation of a glucose ester or to the formation of a glucoside. Whereas glucose esters have long been recognized to be high energy compounds acting as transient intermediates in the formation of other metabolites (11), glucosides have increased water solubility, provide access to membrane transport systems, and can act as storage forms of the aglycone (8).
A classic example of these different consequences of glucosylation is the formation of the glucose ester and glucoside of sinapic acid and the glucoside of sinapyl alcohol. Brassicaceae such as Arabidopsis predominantly accumulate sinapoyl esters. 1-O-sinapoylglucose (a glucose ester) is the intermediate in the synthesis of sinapoylmalate, which is a putative UV protectant located in leaf epidermis of the plant (12,13) and sinapoylcholine (14). Sinapoylcholine is made only during seed development and is degraded during germination to provide sinapic acid for the seedling which is converted again via 1-Osinapoylglucose to sinapoylmalate (15). In contrast to the role of the glucose ester of sinapic acid, formation of the more soluble glucoside of sinapic acid (sinapoyl-4-O-glucoside) may be related to storage or detoxification, involving removal of the metabolite from the cytoplasm through transfer into the vacuole (8). Sinapyl alcohol-4-O-glucoside, also known as syringin, is considered to be involved in lignin synthesis, since it is thought that glucosylation of the three monolignols (sinapyl alcohol, coniferyl alcohol, and p-coumaryl alcohol) may aid transport of the monomers out of the cell for polymerization into lignin in muro (3). Recently, a specific glucosidase of coniferin (coniferyl alcohol-4-O-glucoside) has been localized at the differentiating xylem, providing some support for these events in lignin assembly (16).
One way of investigating the role of these three glucosylation reactions in planta is to manipulate the level of expression of the genes encoding the respective UGTs and to analyze the phenotypic consequences. Despite considerable interest in the enzymes over many years, the proteins have not been purified to homogeneity as yet, nor have their genes been cloned. This study describes an alternative approach, where we have used information from the genomic data bases of Arabidopsis to identify sequences containing the UGT signature motif (10). In parallel to a phylogenetic analysis of these sequences, we have screened recombinant proteins for their activities in vitro to-ward sinapates and their related phenylpropanoids. The data in this study describe the identification of five genes that show relevant specificities in vitro, thereby providing a new foundation for defining their roles in the plant.

EXPERIMENTAL PROCEDURES
Chemicals-The majority of the chemicals and phenolic compounds ( Fig. 1) used in this study were purchased from Sigma. Coniferyl aldehyde, sinapyl aldehyde, and sinapyl alcohol were purchased from Apin Chemicals Ltd. p-Coumaryl aldehyde and p-coumaryl alcohol were supplied courtesy of John Ralph (Department of Forestry, University of Wisconsin).
Construction of GST-UGT Expression Plasmids-DNA fragments corresponding to putative UGT sequences with no introns (10) were amplified from Arabidopsis thaliana Columbia genomic DNA by polymerase chain reaction. For those sequences containing introns, fulllength expressed sequence tags obtained from the Arabidopsis Biological Resource Center stock center were used to construct the expression plasmids. Specific oligomer sets were designed according to the sequences of the UGT genes (10). The polymerase chain reactions were set up following the conditions described previously (17). The polymerase chain reaction products were electroeluted from 1% DNA agarose gel. After purification with phenol/chloroform, the DNA fragments were subcloned into the appropriate restriction sites on the multiple cloning site of the GST gene fusion vector pGEX-2T (Amersham Pharmacia Biotech).
Recombinant UGT Purification-Recombinant UGTs were expressed as fusion proteins, each containing a GST fusion partner at the N terminus. To prepare large quantities of recombinant proteins, Escherichia coli strain XL1-Blue was grown at 20°C in 500 ml of 2ϫ YT medium containing 50 g/ml ampicillin until the A 600 reached 1.0. The culture was then incubated with 1 mM isopropyl-1-thio-␤-D-galactopyranoside for 24 h at 20°C to induce synthesis of the GST-UGT fusion proteins. Cells were harvested (5000 ϫ g for 5 min), resuspended (5 ml of ice-cold phosphate-buffered saline), osmotically shocked (18), and centrifuged again (40,000 ϫ g for 5 min). The supernatant was mixed with 100 l of 50% glutathione-coupled Sepharose (Amersham Pharmacia Biotech), the beads were washed with PBS, and adsorbed proteins were eluted with 20 mM reduced form glutathione, 100 mM Tris-HCl, pH 8.0, 120 mM NaCl according to the manufacturer's instructions.
The protein assays were carried out with Bio-Rad Protein Assay Dye using bovine serum albumin as reference. The purified recombinant proteins were also analyzed by SDS-polyacrylamide gel electrophoresis following the methods described by Sambrook et al. (19).
Glucosyltransferase Activity Assay-The assay mix (200 l) contained 0.2 g of recombinant protein, 14 mM 2-mercaptoethanol, 5 mM UDP-glucose, and 1 mM phenylpropanoid substrate. Initial screening for activity of the 36 proteins against each of the 11 potential substrates was carried out at pH 7.0 (100 mM Tris-HCl) and 30°C for 30 min. For detailed kinetic analysis of the enzymes showing significant activity, reactions leading to 4-O-glucosides were carried out at pH 7.0/20°C/30 min, and those leading to glucose esters were carried out at pH 6.0 (potassium phosphate)/20°C/30 min due to their pH optima and linearity of the reactions. Reactions were stopped by the addition of 20 l of trichloroacetic acid (240 mg/ml), quick-frozen, and stored at Ϫ20°C prior to the reverse phase HPLC analysis. The specific enzyme activity was expressed as nmol of phenylpropanoids glucosylated/s (nkat) by 1 mg of protein in 30 min of reaction time. Alkaline hydrolysis was carried out in 0.1 N NaOH at room temperature for 1 h and neutralized by 3 M sodium acetate, pH 5.2.
1 H NMR Analysis-The glucosides for NMR analysis were purified using the HPLC methods described above. The samples were freezedried and resuspended in deuterated methanol. The NMR spectra were acquired on a Bruker AMX 500-MHz NMR spectrometer at 22°C. The data were processed and analyzed using Bruker XWIN-NMR software, version 2.6.
Computer Analysis of Sequence-The sequence analyses were carried out using Genetics Computer Group software (Wisconsin package, version 10.1).

Preparation of Recombinant
Proteins-Screening the Gen-Bank TM data base with the UGT signature motif has revealed a large multigene family of putative UGT sequences in Arabidopsis (10,20,21). The sequences were named following the standardized system of the UGT Nomenclature Committee (22) and were classified into subgroups based on homology comparisons, which were confirmed through detailed phylogenetic analysis (10).
To gain insight into the biochemical properties of the gene products, 36 sequences were used to produce recombinant fusion proteins with GST in E. coli. The relatedness of the sequences chosen for expression and the purified recombinant proteins used for the biochemical assays are shown in Fig. 2. Following purification using glutathione Sepharose as an affinity matrix, some of these fusions proved to be unstable, releasing GST (26 kDa) as a separate polypeptide, a common observation with this fusion system (23).
Screening for Glucosyltransferase Activity-The 36 recombinant proteins were screened for glucosyltransferase activity using UDP-glucose as the sugar donor and each of 11 closely related phenylpropanoids (as shown in Fig. 1) as substrates under identical assay conditions. When the reaction mixes were analyzed using HPLC, only five proteins (UGT84A1, UGT84A2, UGT84A3, UGT72E2, and UGT72E3) showed significant activity toward the cinnamic acids and alcohols; 25 proteins showed no activity toward any of the substrates, and a further six displayed only trace activities and are not described in detail. None of the 36 proteins were able to glucosylate the aldehydes (data not shown). Results from this screening are summarized in Fig. 2. The control, using GST alone, was unable to glucosylate any of the phenylpropanoids (data not shown). Of the 25 recombinant proteins that showed no activity toward any of the substrates tested in Fig. 1, 10 sequences showed some activities toward scopoletin, implying that the proteins were catalytically active in vitro (data not shown). As yet, no substrate has been identified for the remaining 15 proteins. While these were purified from soluble fractions of the E. coli lysate, it remains unknown whether they will be catalytically active under the conditions used in the in vitro assays.  (10) with the prefix UGT omitted for clarity. Corresponding GST-UGT fusion proteins were purified from E. coli and were analyzed using 10% (w/v) SDS-polyacrylamide gel electrophoresis. The proteins were visualized with Coomassie staining. Large quantities of these recombinant proteins were purified and incubated individually with the 11 substrates shown in Fig. 1. Each assay contained 0.2 g of recombinant protein, 5 mM UDP-glucose, 1 mM phenylpropanoid substrate, and 100 mM Tris-HCl, pH 7.0. The mix was incubated at 30°C for 30 min and was analyzed by reverse phase HPLC. Proteins forming glucose esters (q) and glucosides (E) are highlighted, together with significant (f) or trace (Ⅺ) enzyme activity, which is defined as conversion of Ͼ5% or Ͻ5% activity, respectively, relative to the maximum conversion (100%) observed for each substrate.

Characterization of Glucose Ester and 4-O-Glucoside Reaction Products-Glucosylation
alcohol-4-O-glucoside). As shown in Fig. 3B, this product is also stable in 1 N NaOH. The identities of the three glucose conjugates were further confirmed by NMR analysis (Table I) using NMR spectra assigned according to published information (24,25).
UGTs Catalyzing the Formation of Cinnamate Glucose Esters in Vitro-From the initial screening as shown in Fig. 2, three enzymes, UGT84A1, UGT84A2, and UGT84A3 showed significant activity in forming glucose ester conjugates with the cinnamic acids in vitro. The specificity and kinetics of these enzymes were analyzed in detail. As shown in Fig. 4, UGT84A2 clearly shows the highest specificity for sinapic acid, has a low K m toward this substrate, and is virtually inactive toward other cinnamic acids. Both UGT84A1 and UGT84A3 can glucosylate sinapic acid, but they have a higher K m than UGT84A2 and are also active toward other substrates. UGT84A1 is the only enzyme that shows significant activity and has high affinity toward caffeic acid. While UGT84A1 also displays a strong activity toward p-coumaric acid, the affinity of the enzyme toward the substrate is low. UGT84A3 similarly has a broad enzyme activity toward a number of substrates, but a comparison of the Michaelis-Menten kinetics suggests that ferulic acid may be the preferred substrate under the conditions used.
UGTs Catalyzing the Formation of Cinnamate Glucosides in Vitro-The two enzymes, UGT72E2 and UGT72E3, which produced 4-O-glucosides, were active only against the four substrates shown in Fig. 5. Whereas UGT72E2 and UGT72E3 glucosylate sinapyl alcohol at high levels of specific activity, only UGT72E2 showed activity with coniferyl alcohol. From a comparison of the Michaelis-Menten kinetics, UGT72E2 is likely to glucosylate sinapyl alcohol, whereas UGT72E3 may be the enzyme responsible for conversion of sinapic acid into its glucoside. DISCUSSION Our aim in this study has been to identify enzymes capable of glucosylating sinapic acid and sinapyl alcohol in vitro, as a foundation for going on to study their role in the formation of sinapoylmalate and lignin in planta. We have used Arabidopsis thaliana as the model, since the availability of gene sequence information in the data bases has enabled us to gain insight rapidly into a family of sequences containing a UGT signature motif (10). Phylogenetic analysis of the Arabidopsis UGT multigene family has revealed 12 groups (10) that in turn have provided a predictive framework for screening recombinant proteins for catalytic activities of interest. The data described in this study highlights the efficiency of such an approach, since we have identified five UGTs expressing the relevant activities when assayed in vitro.
Sinapoylmalate has been suggested to act as a foliar UV protectant in Arabidopsis (12, 13), although as yet there is no direct supportive evidence. The biosynthetic pathway leading to sinapoylmalate in the Brassicaceae is well characterized biochemically (15), and Arabidopsis genes encoding the enzymes upstream and downstream of UGT involvement have been identified by mutational analysis (26,27). Study of the fah1 mutant showed that the seedlings were more susceptible than wild type to UV stress (12). Since the FAH1 locus encodes ferulate-5-hydroxylase, a cytochrome P450-dependent monooxygenase responsible for the formation of 5-hydroxyferulic acid, the precursor of sinapic acid (28), the data implied that the product of the reaction, 5-hydroxyferulic acid, or metabolites downstream of 5-hydroxyferulic acid, such as sinapic acid and sinapoylmalate, were involved in UV protection. However, recent analyses of Arabidopsis overexpressing FAH1 have shown no accumulation of sinapoylmalate (29), suggesting that levels of FAH1 do not control flux through this part of the cinnamate pathway. Since the glucose ester is the direct precursor of sinapoylmalate, manipulation of UGT levels involved in its formation may provide a better tool with which to investigate the potential link between sinapoylmalate and UV protection.
The kinetic data we have gained from the in vitro assays now suggest that UGT84A2 corresponds to the UGT responsible for synthesis of the glucose ester intermediate in the sinapoylmalate pathway. The specificity of this UGT toward sinapic acid is surprisingly high and contrasts with that shown for UGT84A1 and UGT84A3 that can form the glucose ester of sinapic acid but can also glucosylate other cinnamic acids. As yet, we do not know the relationship of these in vitro analyses to physiological events in the plant. For example, it is not known whether the same UGT is involved in glucosylation of sinapic acid in the leaves and in the developing seed. In the leaves, a glucose ester is converted to sinapoylmalate, whereas in seeds, a glucose ester is converted to sinapoylcholine (14). Gene knock-outs in UGT84A1, UGT84A2, or UGT84A3 and metabolite profiling of the transgenic plants will provide important insights into these possibilities. Similarly, the cellular specificity and regulation of expression of the three genes will also provide a context for understanding the role of the gene products in the Arabidopsis plant.
A common feature of the three genes UGT84A1, UGT84A2, and UGT84A3 is that they encode UGTs that all form glucose esters. Interestingly, the sequences are located within the same branch of the multigene family ( Fig. 2; Ref. 10). Another sequence closely related on the basis of homology and located in the same branch is UGT84A4, but in contrast, the recombinant enzyme only showed trace activity in forming glucose esters with cinnamic acids when assayed under identical conditions in vitro. A similar trace activity toward cinnamic acids was also observed with UGT84B1 (Fig. 2), but other studies have shown this UGT to be highly specific in forming the glucose ester of  (30). Whereas sinapate ester metabolism is of almost exclusive relevance to plant species of the Brassicaceae, lignin synthesis and metabolism impact more generally on our understanding of plant cell walls and their determining role in development and defense responses. Many recent reviews have addressed the potential role of the glucosides of the monolignols in lignin assembly (reviewed in Ref. 3). Typically, the glucosides are considered to represent the transport forms of the monolignols, with their respective UGTs and glucosidases acting sequentially in the assembly process. In conifers, the existence of coniferin (coniferyl alcohol-4-O-glucoside) is well established, and recently the gene encoding a coniferin-specific glucosidase has been identified and shown to release the aglycone in vitro (16). To date, however, no gene encoding UGTs of monolignols has been identified from any plant species, and the biochemical work that has been undertaken for more than 20 years has involved partially purified protein fractions (31).
The data described in this report now provide new tools with which to understand the role of UGTs in lignin synthesis. The Arabidopsis genes UGT72E2 and UGT72E3 encode enzymes that can glucosylate coniferyl alcohol and sinapyl alcohol in vitro. Analysis of their cell-specific expression and regulation, together with metabolite profiling following knock-outs or overexpression of the genes, will provide important contributions to the ongoing debates surrounding lignin biology. For example, if the glucosides of the monolignols are essential precursors to lignin, then knocking out UGT72E2 or UGT72E3 should give impact on the composition of the lignin synthesized.
As of March 1, 2000, 98 sequences corresponding to putative UGTs have been identified in the Arabidopsis genome (10), suggesting that in the total genome there may be as many as 120 sequences containing the UGT signature motif. Surprisingly, phylogenetic analysis of these additional sequences indicates that none is closely related to those encoding the UGTs involved in glucose ester and 4-O-glucoside formation that are described in this study (10). While many early studies have biochemically analyzed UGT activities purified or partially pu- FIG. 3. Alkaline hydrolysis of the glucose conjugates of sinapic acid and sinapyl alcohol. Each assay contained 0.2 g of recombinant protein, 5 mM UDP-glucose, 1 mM phenylpropanoid substrate, and 100 mM Tris-HCl, pH 7.0. Following preincubation at 30°C for 30 min, the reaction mix was transferred to room temperature for 1 h in the presence (ϩ) or absence (Ϫ) of 1 N NaOH. All of the reaction mixtures were analyzed by reverse phase HPLC. GST protein was used as negative control to show that this fusion partner does not catalyze the glucosylation reaction. A, sinapic acid was used as substrate in the assay. B, sinapyl alcohol was used as substrate in the assay. rified from a wide range of different plant families (32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42), there have been no previous attempts to study this multigene family in a single species and therefore no possibility of directly comparing relative activities across family members. Access to the kinetic analysis of many family members from a single species can provide increased confidence of the range of substrates that may be used by each of these enzymes in vitro and thereby a foundation for exploring their substrates in vivo and their physiological roles in the plant. Of equal relevance is the wider use of UGTs for industrial applications. Studies building on known substrate specificities across the multigene family and their experimental modification by processes such as DNA shuffling can now provide a new platform for designing biotransformations in vivo and in vitro.