Independent evolution of rosmarinic acid biosynthesis in two sister families under the Lamiids clade of flowering plants

As a means to maintain their sessile lifestyle amid challenging environments, plants produce an enormous diversity of compounds as chemical defenses against biotic and abiotic insults. The underpinning metabolic pathways that support the biosynthesis of these specialized chemicals in divergent plant species provide a rich arena for understanding the molecular evolution of complex metabolic traits. Rosmarinic acid (RA) is a phenolic natural product first discovered in plants of the mint family (Lamiaceae) and is recognized for its wide range of medicinal properties and potential applications in human dietary and medical interventions. Interestingly, the RA chemotype is present sporadically in multiple taxa of flowering plants as well as some hornworts and ferns, prompting the question whether its biosynthesis arose independently across different lineages. Here we report the elucidation of the RA biosynthetic pathway in Phacelia campanularia (desert bells). This species represents the borage family (Boraginaceae), an RA-producing family closely related to the Lamiaceae within the Lamiids clade. Using a multi-omics approach in combination with functional characterization of candidate genes both in vitro and in vivo, we found that RA biosynthesis in P. campanularia involves specific activities of a BAHD acyltransferase and two cytochrome P450 hydroxylases. Further phylogenetic and comparative structure–function analyses of the P. campanularia RA biosynthetic enzymes clearly indicate that RA biosynthesis has evolved independently at least twice in the Lamiids, an exemplary case of chemotypic convergence through disparate evolutionary trajectories.

As a means to maintain their sessile lifestyle amid challenging environments, plants produce an enormous diversity of compounds as chemical defenses against biotic and abiotic insults. The underpinning metabolic pathways that support the biosynthesis of these specialized chemicals in divergent plant species provide a rich arena for understanding the molecular evolution of complex metabolic traits. Rosmarinic acid (RA) is a phenolic natural product first discovered in plants of the mint family (Lamiaceae) and is recognized for its wide range of medicinal properties and potential applications in human dietary and medical interventions. Interestingly, the RA chemotype is present sporadically in multiple taxa of flowering plants as well as some hornworts and ferns, prompting the question whether its biosynthesis arose independently across different lineages. Here we report the elucidation of the RA biosynthetic pathway in Phacelia campanularia (desert bells). This species represents the borage family (Boraginaceae), an RA-producing family closely related to the Lamiaceae within the Lamiids clade. Using a multi-omics approach in combination with functional characterization of candidate genes both in vitro and in vivo, we found that RA biosynthesis in P. campanularia involves specific activities of a BAHD acyltransferase and two cytochrome P450 hydroxylases. Further phylogenetic and comparative structure-function analyses of the P. campanularia RA biosynthetic enzymes clearly indicate that RA biosynthesis has evolved independently at least twice in the Lamiids, an exemplary case of chemotypic convergence through disparate evolutionary trajectories.
Plants of the mint family (Lamiaceae, order Lamiales), such as basil, rosemary, lavender, and mint, are used worldwide as popular aromatic culinary herbs. A number of Lamiaceae spe-cies have also been used medicinally for centuries in countries such as China, Lebanon, and Uzbekistan (1)(2)(3). Some reported medicinal properties of Lamiaceae plants, including antimicrobial, anxiolytic, anti-inflammatory and anti-cancer properties, have been attributed to the rich and diverse specialized metabolites present in these plants (4 -7).
Rosmarinic acid (RA), 3 a specialized phenolic ester abundantly present in most Lamiaceae plants ( Fig. 1), has garnered particular attention because of its multifaceted medicinal properties and potential for treating various human diseases (8 -11). For example, RA is recognized for its antioxidant and neuroprotective properties (11,12). Multiple animal studies have demonstrated the anti-inflammatory activity of RA through repression of the NF-B and STAT3 signaling pathways (13,14). Additionally, RA is an inhibitor of GABA transaminase and therefore serves as an antianxiety agent by modulating GABA levels in the central nervous system (15). RA is also an inhibitor of indoleamine-pyrrole 2,3-dioxygenase, which is an immune checkpoint molecule and has recently emerged as an important target for cancer immunotherapy (16). The potential utility of RA as a therapeutic agent and in other biotechnological applications has propelled research regarding RA biosynthesis in Lamiaceae (17).
Although common in the Lamiaceae, RA is a specialized metabolite that is distributed only sporadically throughout the plant kingdom (Fig. 1), primarily within angiosperms but also in some ferns and hornworts (20). This scattered taxonomic distribution pattern in land plants suggests the complex evolutionary history of RA biosynthesis. RA biosynthesis could have emerged once in early land plants and subsequently been lost in many taxa. Independent loss of ancestral metabolic traits has been well-documented in the case of pale beach mice, where the light pigment phenotypes of beach mouse populations from different parts of the world have been found to be caused by independently arising mutations (21). Alternatively, RA biosynthesis might have evolved independently multiple times during land plant evolution, which has been a common evolutionary phenomenon in plant-specialized metabolism (22). To date, there is no sufficient evidence to distinguish between these two evolutionary scenarios.
Here we examined a family of plants within the Lamiids clade, closely related to Lamiaceae: the borage family (Boraginaceae, order Boraginales). Previous studies have demonstrated that many sampled members of the Boraginaceae also produce RA, like the Lamiaceae (20,(23)(24)(25). In contrast, plants from Solanales, an order immediately sister to Boraginales (Fig.  1), are not RA producers (20). Notably, no other family under the order Lamiales besides the Lamiaceae has been shown to produce RA (20). This intriguing distribution of RA in taxa closely related to and within the Lamiids led us to investigate the biosynthetic origin of RA in Boraginaceae. Using a candidate-based approach, we discovered the RA biosynthetic pathway in the Boraginaceae plant Phacelia campanularia and provide substantial evidence supporting independent evolution of RA in two closely related Lamiid families.

The Boraginaceae RAS is a highly expressed BAHD acyltransferase paralogous to spermidine hydroxycinnamoyltransferase
To study RA biosynthesis in the Boraginaceae family, a littlestudied family of nonmodel plants, we selected P. campanularia, colloquially known as desert bells or California bluebells, as our experimental species (Fig. 1). P. campanularia grows easily in standard greenhouses, and its transcriptome has been sequenced previously through the 1000 Plants (1KP) initiative (26). Using LC high-resolution accurate-mass MS-based metabolic profiling, we confirmed that laboratory-grown P. campanularia produces RA (Fig. S1A).
We hypothesized that the Boraginaceae RAS, like that of Lamiaceae, is a relatively highly expressed BAHD acyltransferase and mined the P. campanularia transcriptome for genes that fit this criteria (Table S1) (18,27,28). Of the top 20 most highly expressed BAHD acyltransferase-encoding genes, the 15 that harbored full-length ORFs were cloned and transiently expressed in the leaf tissue of Nicotiana benthamiana using Agrobacterium-mediated infiltration (29). 5 days after infiltration, we assessed the accumulation of the immediate RAS product 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate in the infiltrated N. benthamiana leaves by LC high-resolution accurate-mass MS. Of all 15 candidates tested, only the second most highly expressed P. campanularia BAHD gene yielded the RAS product; it is hereafter referred to as PcRAS (Table S1 and Fig. 3, A and C).
Molecular phylogenetic analysis of PcRAS together with other related BAHD acyltransferases from a variety of plant species revealed a Boraginaceae-specific RAS clade distinct from the Lamiaceae RAS clade (Fig. 2B). Although Lamiaceae RAS enzymes appear to have evolved from the hydroxycinnamoyl-CoA:shikimate hydroxycinnamoyltransferases (HCTs), which produce 4-coumaroyl shikimate, a key intermediate in the conserved plant phenylpropanoid metabolism, Boraginaceae RAS enzymes share an ancestor with the spermidine hydroxycinnamoyltransferases (SHTs), which produce hydroxycinnamoylspermidine conjugates reported to accumulate in the pollen coat of Arabidopsis thaliana (30 -34). Compared with most of the BAHD enzymes represented in the phylogeny, which accept negatively charged acyl acceptor substrates, SHT is unique because it prefers the positively charged acyl acceptor spermidine. Although BAHD acyltransferases are well-recognized for their relaxed substrate selectivity, they tend to be specific with regard to the charge identity of their acyl acceptor substrates, as this property is generally dictated by the electrostatic properties of the acyl acceptor binding pocket within the enzyme active site (35,36). It is therefore intriguing that Boraginaceae RAS, which has evolved to accommodate the negatively charged acyl acceptor substrate (R)-3-(4-hydroxyphenyl)lactate, is closely related to SHT. Functional analysis demonstrated that PcRAS does not retain SHT function, suggesting that it has lost its ancestral activity (Fig. S2).
Our phylogenetic analysis also revealed that P. campanularia contains a presumed ortholog of hydroxycinnamoyl-CoA:quinate hydroxycinnamoyltransferase (HQT), which is involved in chlorogenic acid biosynthesis (37). Indeed, metabolic profiling of P. campanularia leaf tissue revealed the presence of chlorogenic acid (Fig. S1B). Our enzymatic assays further confirmed the activity of PcHQT (Fig. 4A).

Figure 2. Phylogenetic analysis to inform enzymes involved in RA biosynthesis.
A, The RA biosynthetic scheme, starting from 4-coumaroyl-CoA and (R)-3-(4-hydroxyphenyl)lactate. B, a maximum likelihood phylogeny of BAHD acyltransferases closely related to PcRAS reveals an uncharacterized putative Boraginaceae RAS clade, distinct from the Lamiaceae RAS clade. C, a maximum likelihood phylogeny of P450s in the CYP98 family reveals two uncharacterized Boraginaceae-specific clades, putative meta-hydroxylases in the RA pathway, distinct from the Lamiaceae meta-hydroxylases. Bolded genes are those that were characterized in this study, whereas underlined genes are those characterized elsewhere and discussed here. In this set, asterisks denote genes involved in RA biosynthesis. Bootstrap values (based on 500 replicates) are indicated at the tree nodes. The scale measures evolutionary distances in substitutions per amino acid.

Evolution of rosmarinic acid biosynthesis
To test the function of these newly identified P. campanularia P450s, we coinfiltrated combinations of PcRAS, PcCYP98A112, and PcCYP98A113, as well as PcHQT and PcCYP98A111 as negative controls, into N. benthamiana leaves in the presence or absence of the RA precursors 4-coumarate and (R)-3-(4-hydroxyphenyl)lactate (Fig. 3). Although neither PcHQT nor PcCYP98A111 produced any on-pathway RA intermediates, transgenic expression of PcRAS was sufficient to produce 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate (Fig. 3, A and C). The in planta function of PcRAS was further corroborated by stable transgenic expression of PcRAS in A. thaliana (Fig. 3D). Additionally, coinfiltration of PcRAS, PcCYP98A112, and PcCYP98A113 was sufficient to produce RA in a substratedependent manner (Fig. 3, B and C). Interestingly, the combination of PcRAS and PcCYP98A113, excluding PcCYP98A112, was sufficient to produce small quantities of RA. This observation suggests that an endogenous N. benthamiana meta-hydroxylase, perhaps the conserved C3ЈH, may be able to perform some degree of PcCYP98A112 function, especially in the pres-ence of excess substrate. This is consistent with the observation that PcCYP98A112 is phylogenetically more closely related to C3ЈH (Fig. 2C). PcCYP98A113, on the other hand, exhibits a more distant phylogenetic pattern, reflected in the relative length of the branch to the CYP98A113 clade, and also appears to have a more specialized biochemical function.

In vitro characterization of P. campanularia RA biosynthetic enzymes
We expressed and purified recombinant PcHCT, PcHQT, and PcRAS from E. coli and carried out Michaelis-Menten kinetics assays ( Fig. 4A and Table 1) to examine the substrate specificity of these enzymes toward a panel of acyl acceptor compounds in the presence of 4-coumaroyl-CoA as the common acyl donor at a saturating concentration, chosen based on previously reported K m values for HCT and RAS enzymes (27,38). As expected, PcHCT, PcHQT, and PcRAS were specific toward shikimate, quinate, and (R)-3-(4-hydroxyphenyl)lactate, respectively. However, we observed that both the K m and

Evolution of rosmarinic acid biosynthesis
V max measured for PcRAS against (R)-3-(4-hydroxyphenyl)lactate were inferior relative to CbRAS (Table S3) and those reported in the literature for other Lamiaceae RASs, demonstrating that PcRAS may not be as catalytically efficient under our experimental conditions (27,35). To rule out the possibility that an alternative substrate is preferred by PcRAS, we screened many possible variations in acyl donor and acyl acceptor compounds in accordance with possible RA pathway intermediates as well as different acyl acceptor stereoisomers (Fig. S3). Additionally, we quantified the absolute RA content across multiple tissues of C. blumei (Lamiaceae) and P. campanularia, ruling out the possibility that P. campanularia produces significantly lower quantities of RA than a Lamiaceae plant (Table S4). Collectively, these results suggest that, despite kinetic parameters differing from those of Lamiaceae RASs, PcRAS is likely sufficient to support RA accumulation in vivo. Alternatively, there may be additional levels of regulation that improve the catalytic efficiency of PcRAS in vivo.

Evolution of rosmarinic acid biosynthesis
the host's endogenous RNAi pathway to allow knockdown of a target gene of interest (39 -42). We were further encouraged by a reported instance of tobacco mosaic virus replication in P. campanularia (43). Because of its availability and widespread use, we started with the tobacco rattle virus (TRV) VIGS system. To facilitate viral replication and maximize VIGS efficiency, we utilized the mechanical transmission technique, where we first propagated viral VIGS vectors in N. benthamiana and subsequently applied the viral leaf extract onto P. campanularia leaves (Fig. 5A) (44).
To assess whether TRV VIGS is a viable gene knockdown method in P. campanularia, we first attempted to silence P. campanularia phytoene desaturase (PcPDS), a highly conserved gene involved in carotenoid biosynthesis (45). Silencing PDS in other plants causes an albino phenotype, serving as an apparent visual cue of successful gene silencing (39). In our initial optimization of the PcPDS silencing experiment, we found that knockdown was most effective in 4-week-old P. campanularia plants and that efficacy ranged from nearly uniform knockdown throughout the leaf to, more commonly, sparse knockdown on a background of healthy leaf tissue (Fig.  5B). Therefore, although lacking desirable robustness, TRV VIGS offers a functional approach to assess gene function in P. campanularia.
Next we designed a VIGS construct to silence PcRAS as well as a construct containing GFP to use as a control. Because depleted RA accumulation lacks an associated visible phenotype as that of PcPDS knockdown, we could not target specific parts of the affected tissues for analysis. We therefore harvested VIGS-treated whole leaves for metabolomics analysis. Leaves inoculated with PcRAS VIGS produced significantly lower quantities of RA than those inoculated with a control, although PcRAS VIGS did not cause complete depletion of RA accumulation (Fig. 5C). It is possible that the existing RA pool prior to VIGS depletes slowly when PcRAS expression is knocked down. Additionally, the knockdown nature of RNAi and the lack of robustness in tissue penetrance of TRV VIGS in P. campanularia may contribute to this.

Structural features of independently evolved Boraginaceae and Lamiaceae RAS enzymes
To elucidate the structural elements dictating the common biochemistry of two independently evolved RAS enzymes, we solved the X-ray crystal structure of C. blumei RAS (CbRAS) in complex with 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate at 3.35 Å resolution (Table S5 and Fig. 6, A and B). CbRAS consists of two pseudosymmetrical domains, the N-terminal domain (residues 1-171 and 371-391) and C-terminal domain (residues 218 -370 and 392-429) connected by a long intervening loop (residues 172-217) that lacks secondary structure (Fig.  6A). The active site, created by the interface of the two domains, houses the 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate ligand, with the acyl donor and acceptor moieties spatially located within the N-and C-terminal active site regions,
The structural alignment of the two RAS structures reveals a high degree of tertiary structure similarity (Fig. S5A), characteristic of BAHD acyltransferases (36,47,48). Nevertheless, CbRAS and PcRAS share overall low sequence identity (Fig. 6E) compared with other well-characterized orthologous groups of BAHDs and contain many active-site amino acid substitutions (Fig. S5) (30). Interestingly, a number of substitutions are functionally similar (Fig. 6C). For instance, in CbRAS, His-41, which appears to play a role in constructing the binding pocket for the 4-coumaroyl moiety of the ligand, is substituted in PcRAS by another aromatic amino acid, Tyr-53, which retains a similar side-chain volume and the capability to hydrogen-bond with the 4-OH group of the ligand. Additionally, Ser-39 in CbRAS is substituted by Thr-51 in PcRAS, indicating selection for a small, hydrophilic residue at this position, whereas conservation of hydrophobic residues is reflected by the substitution of Val-148 and Ile-373 in CbRAS to Met-163 and Met-403 in PcRAS, respectively. Moreover, the aromatic Tyr-398 in CbRAS is substituted by the similarly aromatic Phe-428 in PcRAS. These trends were supported across a more extensive multiple sequence alignment comparing Boraginaceae and Lamiaceae RAS enzymes and are consistent with the notion that RAS evolved independently in Boraginaceae and Lamiaceae (Fig. S5).
The structures also shared several conserved residues (Fig.  6D). As expected, both enzymes contain residues that are known to be involved in the catalytic cycle of a typical BAHD acyltransferase, such as the catalytic histidine (His-152 in CbRAS and His-167 in PcRAS) and the tryptophan involved in stabilizing the negatively charged, tetrahedral transition state intermediate (Trp-368 in CbRAS and Trp-398 in PcRAS) (36,48). Additionally, we observed the conservation of an activesite lysine that appears to form a salt bridge with the carboxylate group of the ligand (Lys-396 in CbRAS and Lys-426 in PcRAS) (Fig. S5). Recombinant alanine mutants of PcRAS at these three conserved positions all displayed less than 5% activity for 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate production compared with WT PcRAS in vitro (Fig. S6). In contrast, mutations to the p-coumaroyl binding pocket of PcRAS did not result in a drastic reduction of catalysis (Fig. S6), suggesting that selective binding of the acyl acceptor molecule is more crucial for an efficient RAS protein. Previously, we identified an arginine residue in HCT that serves as a handle to attract and bind the acyl acceptor substrate shikimate into the proper position and orientation within the active site (36). It is likely that this lysine residue acts analogously throughout the RAS catalytic cycle, facilitating binding of (R)-3-(4-hydroxyphenyl)lactate by engaging in a guiding electrostatic interaction with the acyl acceptor substrate. It is of particular interest that, despite the independent origins of the Boraginaceae and Lamiaceae RASs from nonorthologous progenitors, both sets of enzymes converged toward an active-site lysine to mediate the specific acylacceptor-binding mechanism (Fig. S5).

Discussion
The RA biosynthetic pathway reported here is the first to be characterized from any non-Lamiaceae species. Notably, although Lamiaceae RASs and Boraginaceae RASs both belong to the same subfamily of the BAHD acyltransferases, indicating

Evolution of rosmarinic acid biosynthesis
their related evolutionary trajectories, Lamiaceae RASs were derived from an HCT progenitor, whereas we infer that the Boraginaceae RASs evolved from an ancestral SHT ( Fig. 2A) (49). Likewise, although the newly evolved P450s involved in RA biosynthesis in both Boraginaceae and Lamiaceae are members of the CYP98 family, Boraginaceae contains two separate clades of meta-hydroxylases with further specialized catalytic specificity, whereas Lamiaceae only contains a single dualfunction meta-hydroxylase (Fig. 2B). The simplest explanation for the phylogenetic pattern of these related CYP98 P450s is parallel evolution of neofunctionalized RA biosynthetic P450s in Boraginaceae and Lamiaceae from the highly conserved ancestral C3ЈH. Collectively, this evidence supports that the Boraginaceae RA biosynthetic pathway evolved independent from that of the Lamiaceae.
From a biophysical perspective, the fact that Boraginaceae RAS evolved from an ancestral SHT is intriguing because such an evolutionary trajectory calls for a switch in the charge specificity of the acyl acceptor substrate; SHT binds the positively charged spermidine, whereas RAS binds the negatively charged (R)-3-(4-hydroxyphenyl)lactate. This key event is further implied by the structural observation of the electrostatic nature of the acyl-acceptor-proximal region of the RAS active site. Interestingly, despite the disparate evolutionary histories of PcRAS and CbRAS, it appears that these enzymes converged upon the same molecular mechanism to establish a positive electrostatic identity within the acyl-acceptor-binding pocket and, in turn, promote efficient substrate binding. Our structural analysis and mutagenesis studies revealed an active-site lysine (Lys-426 in PcRAS and Lys-396 in CbRAS) that appears to engage in an ionic interaction with the carboxylate func-tional group of (R)-3-(4-hydroxyphenyl)lactate (Fig. 6C), similar to the mechanism by which an active-site arginine assists shikimate binding in HCT (36).
Our study is also the first to resolve two regioselective P450s catalyzing the two meta-hydroxylation steps of RA biosynthesis in a non-Lamiaceae species. A previous study characterizing CYP98A6 from Lithospermum erythrorhizon (Boraginaceae), which is a presumed ortholog of PcCYP98A112, noted this enzyme's ability to 3-hydroxylate the 4-coumaroyl moiety of 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate or 4-coumaroyl-(R)-3-(3,4-dihydroxyphenyl)lactate, whereas a study of C. blumei revealed that CYP98A14 is a bifunctional P450 capable of hydroxylating both the 3 and 3Ј positions of 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate (19,50). Of note, we and others were unable to identify any additional Lamiaceae CYP98 genes putatively involved in RA biosynthesis. It seems plausible that the meta-hydroxylation steps of RA pathway intermediates evolved to be a one-enzyme process in Lamiaceae and a twoenzyme process in Boraginaceae (Fig. 7). Moreover, in Boraginaceae, it appears that 3-hydroxylation of the acyl acceptor moiety is functionally more derived than 3Ј-hydroxylation of the acyl donor moiety. This is implied by the more distant evolutionary relationship of PcCYP98A113 with the C3ЈH ancestral clade (Fig. 2C) and supported by the observation that PcCYP98A111 can catalyze the conversion of 4-coumaroyl-(R)-3-(3,4-dihydroxyphenyl)lactate to RA at low efficiency (Fig.  S4A). Additionally, coinfiltration of PcRAS and PcCYP98A113 is sufficient to produce some quantity of RA, indicating that the endogenous N. benthamiana C3ЈH may be compensating for PcCYP98A112 function (Fig. 3B). Notably, the activity assay results indicate that there may not be any requisite order of

Evolution of rosmarinic acid biosynthesis
hydroxylation, suggesting the existence of a metabolic grid wherein 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate is shuttled serendipitously between CYP98A112 and CYP98A113 until it is converted to RA (Fig. 4, B and C). Finally, although we observed that either PcCYP98A112 or PcCYP98A113 is sufficient to produce a small quantity of RA from 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate in vitro, the results of our N. benthamiana infiltration experiments indicate that the majority of RA in vivo is likely produced by a combination of the two cytochrome P450 hydroxylases (Fig. S4B and Fig. 3B).
To adapt to challenging terrestrial environments with a sessile lifestyle, plants have evolved highly pliable specialized metabolic systems that continue to give rise to new metabolic traits. This evolutionary development is rooted in ancestral enzyme promiscuity and largely driven by gene duplication followed by sub-and neofunctionalization (51)(52)(53)(54)(55)(56)(57). Our findings reflect upon this remarkable plasticity of specialized metabolism and address a long-standing question regarding the evolutionary origins of RA in the plant kingdom. Our results reveal a scenario where two sister families within the Lamiid clade of flowering plants have evolved the ability to produce RA using similar chemical logic but through a different set of metabolic enzymes. In addition to the Lamiids, there are many other sporadic appearances of RA throughout the plant kingdom, and in nearly all of these instances, we lack an understanding of the underlying biosynthesis. Extrapolating our findings, it seems likely that many of these other cases are also examples of independent metabolic evolution.

Transcriptome assembly
P. campanularia raw RNA-Seq reads (FASTQ files) were downloaded from the 1KP project (SRA ID ERR2040527) (26), trimmed for sequencing adaptors using Trimmomatic (59), and assembled into a de novo transcriptome using Trinity v. 2.6.5 (60), resulting in 43,709 transcript sequences. Within those, 32,155 putative open reading frames were identified using Transdecoder (61). Gene expression statistics were determined using RSEM with bowtie2 as an aligner (62,63). Transcripts and predicted protein sequences were annotated with transcriptper-million values and closest BLAST hits from the Uni-ProtKB/Swiss-Prot database using in-house scripts. Transcriptome mining was performed on a local BLAST server (64).

Phylogenetic analysis of candidate RA biosynthetic enzymes
Candidates for evolutionary analysis were selected from the assembled P. campanularia transcriptome (26). Sequence alignments were performed in MEGA7 using the MUSCLE algorithm, and all phylogenetic analyses were conducted in MEGA7 (65,66). Evolutionary history was inferred by using the maximum likelihood method based on the Le_Gascuel_2008 model (67). Evolutionary rate differences among sites were modeled with a discrete Gamma distribution with five categories. Bootstrap values were calculated using 500 replicates.

Molecular biology
Total RNA was extracted from P. campanularia leaf tissue using the RNeasy Plant Mini Kit (Qiagen). First-strand comple-mentary DNA was synthesized from RNA using the Super-Script III First-Strand Synthesis System with the oligo(dT) primer (Thermo Fisher Scientific). ORFs were amplified from complementary DNA using Phusion High-Fidelity DNA Polymerase (New England Biolabs) and gene-specific primers. Plasmids were linearized using restriction enzymes purchased from New England Biolabs as follows: pHIS8 -4 with NcoI, pEAQ-HT with AgeI and XhoI, pEarleyGate 100 with HindIII and EcoRI, pJKW 1145 with XhoI and XbaI, pTRV2 with BamHI and EcoRI, and pYEPD60 with BamHI and EcoRI. ORFs were cloned into target vectors using Gibson assembly (68). The accession number for CbRAS is CAK55166.1. PcRAS Y53F, T51A, and K426A mutant plasmids were prepared from the cloned PcRAS-pHIS8 -4 plasmid following the Agilent QuikChange II site-directed mutagenesis protocol. PcRAS W398A and H167A mutant plasmids were prepared from the cloned PcRAS-pHIS8 -4 plasmid by PCR amplification of two overlapping gene fragments using primers bearing the desired mutation and cloning both fragments into the pHIS8 -4 vector using Gibson assembly. All primers used in this study are listed in Table S6.

Transient expression of candidate genes in N. benthamiana and metabolite analysis
PcHQT, PcRAS, PcCYP98A111, PcCYP98A112, PcCYP98A113, and additional genes listed in Table S1 were cloned into pEAQ-HT (29). Vectors were transformed into Agrobacterium tumefaciens strain LBA4404 (69). Bacteria were cultivated at 30°C in YM medium (0.4 g/liter yeast extract, 10 g/liter mannitol, 0.1 g/liter NaCl, 0.2 g/liter MgSO 4 ⅐7H 2 O, and 0.5 g/liter K 2 HPO 4 ⅐3H 2 O) to an A 600 between 2.0 and 2.5, washed with 0.5ϫ PBS, and then diluted with 0.5ϫ PBS to an A 600 of 0.8. For coinfiltration of multiple gene constructs, A. tumefaciens strains were mixed in equal concentrations to a final A 600 of 0.8. 1 ml of bacterial suspension was infiltrated into leaves of N. benthamiana plants between 4 and 6 weeks of age. Leaves were harvested 5 days after infiltration.

Transgenic A. thaliana
To generate pJKW 1145, pEarleyGate 100 was digested with EcoRI and XbaI to remove the 35 S promoter (p35S) and ccdB cassette and then Gibson-assembled with the A. thaliana C4H (AtC4H) promoter amplified from plasmid pCC 0996 (70,71). PcHCT and PcRAS were cloned into pEarl-eyGate 100, and PcRAS was additionally cloned into pJKW 1145. Cloned plasmids were transformed into A. tumefaciens strain GV3130. WT A. thaliana were transformed with the flower dip method, and transformants in the T1 generation were selected by spraying with a 1:500 dilution of Finale (Bayer) (72). Samples for metabolite extraction were harvested from T2 plants.

Mechanical transmission of VIGS to P. campanularia
A 268-bp fragment from the 3Ј end of PcRAS and fulllength GFP were cloned into pTRV2-MCS (73). Together with pTRV1, the cloned vectors were transformed into A. tumefaciens strain GV2260, and cultures were prepared according to a published protocol (74). TRV1 and TRV2 cul-

Evolution of rosmarinic acid biosynthesis
tures were mixed to a final A 600 of 0.5, and 1 ml of the mixture was infiltrated into the leaves of 4-week-old N. benthamiana plants. Leaves were harvested 3 days after infiltration, frozen in liquid nitrogen, and immediately ground in 0.1 M phosphate buffer (pH 7.0) (10 ml/1 g of fresh weight of tissue) and 400-mesh-particle-size silicon carbide (Sigma-Aldrich) abrasive (0.4 g/1 g of fresh weight of tissue). The mixture was gently applied to the surface of P. campanularia leaves, stirring frequently to ensure homogenous suspension of the abrasive throughout the procedure. Inoculated leaves were washed immediately under cold tap water for 5 s. Leaves were harvested 2 weeks following inoculation.

Metabolite extraction from plant tissue samples
Tissues from N. benthamiana, A. thaliana, and P. campanularia were harvested into grinding tubes containing 250 l of 2.0 mm zirconia/silica disruption beads (Research Products International), frozen in liquid nitrogen, and stored at Ϫ80°C until metabolite extraction. Frozen tissue was homogenized with a TissueLyser II (Qiagen), resuspended in 500 l of 50% methanol/100 mg of tissue, and incubated at 65°C for 1 h. Extracts were centrifuged at 25,000 ϫ g for 30 min, and the supernatant was collected for analysis.

Recombinant acyltransferase expression and purification
PcHCT, PcHQT, PcRAS, and PcSHT were cloned into pHIS8 -4, an isopropyl 1-thio-␤-D-galactopyranoside (IPTG)inducible protein expression vector with an N-terminal His 8 tag followed by a tobacco etch virus cleavage site. Cloned vectors were transformed into the Escherichia coli BL21(DE3) strain. Each strain was grown in Terrific Broth medium for 3 h at 37°C, followed by induction of expression with 0.5 mM of IPTG at 18°C overnight. Cells were harvested by centrifugation at 5,000 ϫ g for 15 min, resuspended in lysis buffer (50 mM Tris (pH 8.0), 500 mM NaCl, 20 mM imidazole (pH 8.0), 10% glycerol, 1% Tween 20, and 10 mM 2-mercaptoethanol) and lysed at 4°C with an M-110L microfluidizer (Microfluidics). Lysate was clarified by centrifugation at 25,000 ϫ g for 45 min at 4°C and immediately affinity-purified with a HisTrap HP histidinetagged protein purification column (GE Healthcare Life Sciences). His-tagged tobacco etch virus protease was added to the eluate, and the solution was dialyzed overnight at 4°C in 50 mM Tris (pH 8.0), 50 mM NaCl, and 10 mM 2-mercaptoethanol. Cleaved dialyzed protein was collected by gravity purification through nickel-nitrilotriacetic acid resin (Qiagen), concentrated to 5 ml using 30-kDa Amicon Ultra-15 centrifugal filter units (EMD Millipore), and purified via gel filtration on a HiLoad 16/600 Superdex 200 pg column (GE Healthcare Life Sciences). The eluate was collected and dialyzed overnight at 4°C in 12.5 mM Tris (pH 8.0), 50 mM NaCl, and 2 mM DTT. Dialyzed protein was concentrated to 15 mg/ml in 30-kDa Amicon Ultra-15 centrifugal filter units, flash-frozen in liquid nitrogen, and stored at Ϫ80°C until use. Both affinity purification and gel filtration were performed on Ä KTA Pure FPLC (GE Healthcare Life Sciences). PcRAS mutant proteins were prepared using the same protocol.

P450 expression and microsome purification in Saccharomyces cerevisiae
PcCYP98A111, PcCYP98A112, and PcCYP98A113 were cloned into pYEPD60 and transformed into the S. cerevisiae WAT11 strain (75). Each strain was grown at 30°C in YPGE medium (5 g/liter glucose, 10 g/liter peptone, 10 g/liter yeast extract, and 3% (v/v) ethanol) to an A 600 of 2.7, induced with 20 g/liter galactose, and grown at 30°C to an A 600 of 3.5. Cells were washed three times by centrifugation at 1,000 ϫ g for 5 min and resuspension in high-salt Tris buffer (50 mM Tris, 4 mM EDTA, 150 mM NaCl, and 20% glycerol (pH 7.4)). Washed, resuspended cells were lysed by vortexing with zirconia/silica disruption beads (Research Products International) and centrifuged at 1,000 ϫ g for 10 min, and then the supernatant was transferred. 0.1 g/ml of PEG 3350 was added slowly to the supernatant, and the mixture was agitated for 15 min at 4°C before centrifugation at 25,000 ϫ g for 10 min. The pellet was resuspended in 20 ml of assay buffer (50 mM Tris, 4 mM EDTA, and 20% glycerol (pH 7.4)), homogenized with a Dounce homogenizer, flash-frozen in liquid nitrogen, and stored at Ϫ80°C until use.

Enzyme assays
For acyltransferase enzyme kinetics assays, a master mix of 917 nM A. thaliana 4CL1 in 50 mM Tris-HCl (pH 7.5), 5 mM MgCl 2 , 2 mM ATP, 2 mM p-coumaric acid, and 3 mM CoA was equilibrated overnight at room temperature to yield p-coumaroyl CoA at an approximate concentration of 2 mM. The reactions consisted of 75 l of master mix and 5 l of substrate and initiated with 20 l of enzyme. 2.09 nmol of PcHCT, 2.07 nmol of PcHQT, and 1.95 nmol of PcRAS were used in each reaction, with the following exceptions: 0.209 nmol of PcHCT was used in the shikimate reactions, and 0.207 nmol of PcHQT was used in the quinate reactions. Concentrations of shikimate, quinate, and (R)-3-(4-hydroxyphenyl)lactate were 10, 50, 100, 250, 500, 1,000, 2,500, and 5,000 M. (R)-3-(4-hydroxyphenyl)lactate was tested additionally at 25 M. Reactions were timed for 10 min and quenched by adding 10 l of 10 M acetic acid and used immediately for LC-MS analysis. Time points were confirmed to be within the initial rate by running reactions to completion in the presence of excess enzyme over 1 h and ensuring that measured values were less than 5% of the total amount of product formed. PcRAS mutagenesis assays were performed likewise using 2 nmol of enzyme and 2 mM of (R)-3-(4-hydroxyphenyl)lactate and performed in triplicate. Standard curves were obtained by running reactions with 10, 25, 50, 100, and 250 M substrate to completion and then used to absolutely quantify initial rate measurements. K m and V max were determined by fitting the data to the Michaelis-Menten equation using nonlinear regression in Prism version 7.0a (GraphPad). To measure P450's activity to produce 4-coumaroyl shikimate, 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate, 4-coumaroyl-(R)-3-(3,4-dihydroxyphenyl)lactate, or caffeoyl-(R)-3-(4-hydroxyphenyl)lactate, master mixes (50 mM Tris (pH 7.5), 2 mM 4-coumaroyl-CoA or caffeoyl-CoA, 5 mM appropriate acyl acceptor, and 2 nmol of PcHCT or PcRAS) were set up at room temperature for 2 h. In a total volume of 100 l, 30 l of the master mix was suspended

Evolution of rosmarinic acid biosynthesis
in 100 mM Tris (pH 7.5) and 5 mM NADPH, and 100 l of thawed microsomes was added to start the reaction. Reactions were quenched after 15, 30, or 60 min with 100 l of 10 M acetic acid and used immediately for LC-MS analysis.

Crystallization and structure determination of CbRAS
CbRAS was codon-optimized for E. coli expression, ordered as a synthetic gene, and cloned into the pHIS8 -3 vector, an IPTG-inducible protein expression vector with an N-terminal His 8 tag followed by a thrombin cleavage site. Protein was expressed and purified as described previously. CbRAS 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate crystals were grown by hanging drop vapor diffusion at 4°C by mixing 2 l of 10 mg/ml CbRAS with 1 l of a reservoir solution containing 0.2 M MgCl 2 , 0.1 M sodium 3-{[2-hydroxy-1,1-bis(hydroxymethyl)ethyl]-amino}-1-propanesulfonic acid (pH 8.5), 21% PEG 8000, and 15 mM 4-coumaroyl-(R)-3-(4-hydroxyphenyl)lactate. Single crystals were transferred to a cryoprotection solution containing 30% glycerol in reservoir solution, mounted in a cryoloop, and flash-frozen in liquid nitrogen. X-ray diffraction data were collected at beamline 8.2.1 of the Advanced Light Source at the Lawrence Berkeley National Laboratory equipped with ADSC Quantum 315 CCD detectors. Molecular replacement using A. thaliana HCT as a search model was done with Phaser under Phenix (76). Further structural refinement utilized Phenix programs (77). Coot was used for manual map inspection and model rebuilding (78). Crystallographic calculations were performed using Phenix. The Fo-Fc omit map for the ligand was calculated by removing the ligand coordinates from the structure and re-refining in Phenix. The atomic coordinates and structure factors of CbRAS have been deposited in the Protein Data Bank (www.rcsb.org) 4 under PDB code 6MK2 (79).

Structural modeling of PcRAS
PcRAS was modeled by providing the primary amino acid sequence to the Robetta Full-Chain Protein Structure Prediction Server for Structure Prediction (46). The process was run using default settings, and the top output was used for comparative structural analysis. The root mean square deviation between the CbRAS crystal structure and the PcRAS model is 1.232 Å, as calculated in PyMOL.