Harnessing evolutionary diversification of primary metabolism for plant synthetic biology

Plants produce numerous natural products that are essential to both plant and human physiology. Recent identification of genes and enzymes involved in their biosynthesis now provides exciting opportunities to reconstruct plant natural product pathways in heterologous systems through synthetic biology. The use of plant chassis, although still in infancy, can take advantage of plant cells' inherent capacity to synthesize and store various phytochemicals. Also, large-scale plant biomass production systems, driven by photosynthetic energy production and carbon fixation, could be harnessed for industrial-scale production of natural products. However, little is known about which plants could serve as ideal hosts and how to optimize plant primary metabolism to efficiently provide precursors for the synthesis of desirable downstream natural products or specialized (secondary) metabolites. Although primary metabolism is generally assumed to be conserved, unlike the highly-diversified specialized metabolism, primary metabolic pathways and enzymes can differ between microbes and plants and also among different plants, especially at the interface between primary and specialized metabolisms. This review highlights examples of the diversity in plant primary metabolism and discusses how we can utilize these variations in plant synthetic biology. I propose that understanding the evolutionary, biochemical, genetic, and molecular bases of primary metabolic diversity could provide rational strategies for identifying suitable plant hosts and for further optimizing primary metabolism for sizable production of natural and bio-based products in plants.

Plants produce diverse and often abundant chemical compounds, which play critical roles in these sessile and multicellular organisms to habitat in various environmental niches. Many of these phytochemicals are produced in a lineage-specific manner and thus are often referred to as specialized or secondary metabolites. Many of these plant natural products also provide essential nutrients and valuable resources for the production of pharmaceuticals and biomaterials to the human society (1)(2)(3). Nextgen sequencing and advanced MS technologies are enabling rapid identification of plant-specialized metabolic enzymes (4 -7) and are providing exciting opportunities to produce plant natural products in heterologous systems through synthetic biology (Fig. 1A). Microbial hosts, having well-developed genetic tools and industrial-scale culture methods (e.g. yeast), have been engineered to build chemical production platforms that are optimized for a certain primary metabolic branch on which various downstream pathways, including plant specialized metabolic pathways, have been introduced (8 -15). Although significant success has been made in industrial-scale terpenoid production in microbes (8,14), microbial production of certain classes of plant natural products, such as alkaloids and phenolics, appear to be more challenging, likely due to their toxicity, pathway complexity, and inefficiency of plant-derived enzymes (10,(16)(17)(18).
The use of heterologous plant hosts, although still in early stages, provides alternative and sustainable means to produce plant natural products, which take advantage of global cultivation systems that are propelled by endogenous photosynthetic energy production and carbon fixation (Fig. 1B) (19 -22). The past decade of investments and efforts in developing bioenergy crops (e.g. perennial grasses, fast-growing trees) have further advanced opportunities to grow high-yielding plants in marginal lands, which can avoid direct competition with food crop production and minimize environmental impacts (23)(24)(25)(26)(27)(28). Plant hosts may also have better storage capacity and toxicity resistance for phytochemical production compared with microbial hosts (Fig. 1B). Thus, plant chassis potentially provide promising alternative platforms to produce some of these metabolites that are difficult to produce in microbes, especially if tailored plant hosts (or chassis) are carefully selected and generated depending on downstream target compounds.

Challenges to build plant chassis for synthetic biology
Many specialized metabolic pathways have been successfully introduced to heterologous plants (29 -34). However, relatively little effort has been made in plants to optimize the supply of their primary metabolite precursors (e.g. amino acids, sugars, nucleotides, and fatty acids), from which specialized metabolites are produced (Fig. 1A) (24). Microbial metabolic engineering and synthetic biology studies demonstrated that redirection of carbon flux and efficient supply of a specific primary precursor(s) are critical to achieve efficient production of downstream target products (Fig. 1A) (16,(35)(36)(37)(38). Thus, holistic understanding and engineering of both primary and specialized metabolisms are crucial for efficient and sizable production of natural products in plants.
Unlike in microbes, engineering of plant primary metabolism poses several major challenges (Fig. 1B). (i) There is a much more limited capacity to conduct genetic engineering and mutagenesis screening in plants than in microbes, due to low transformation efficiency and long generation cycles of most plants (months to years versus hours to days). (ii) Plant metabolism is likely more constrained due to almost exclusive reliance on the carbon input from photosynthetic CO 2 fixation, unlike microbes that can utilize multiple carbon sources. (iii) Plant primary metabolic pathways are tightly integrated with each other and directly linked to the growth and development of these complex multicellular organisms, and their manipulation often compromises overall growth and yield (39 -44).
One way to overcome these challenges is to carefully choose host plants, which are naturally tailored toward production of certain classes of compounds, and then to conduct rational and precise engineering of primary metabolism to optimize a certain precursor supply. Here, I discuss one promising approach to achieve this goal by learning from millions of years of experimentations that nature has done. Although primary metabolism is generally assumed to be conserved across the plant kingdom, unlike highly-diversified specialized metabolism (45)(46)(47)(48), there are some examples of evolutionary diversification of primary metabolic pathways, especially at the interface between primary and specialized metabolism (49). Exploring and harnessing such relatively-rare but key evolutionary innovations of plant metabolism will provide useful tools and strategies to optimize plant primary metabolism in coordination with downstream specialized metabolic pathways, in order to achieve efficient production of plant natural products in carefully-selected plant hosts.

Ancient diversifications of primary metabolism in plants from other kingdoms
Despite the general conservation of primary metabolic pathways among different kingdoms of life, some of them are unique in plants, which likely contributed to the tremendous chemical diversity seen in the plant kingdom today. Understanding such fundamental differences provides a critical basis for constructing plant chemical production platforms through metabolic engineering. Here, I highlight prominent examples found in primary metabolic pathways that support two major classes of plant natural products, terpenoid (isoprenoid) and phenylpropanoid compounds.

Two alternative isopentenyl diphosphate biosynthetic pathways to support diverse terpenoid formation in plants
Isopentenyl diphosphate (IPP), 2 and its allylic isomer dimethylallyl diphosphate (DMAPP), is the precursor and building . The underlying specialized metabolic pathways can be identified and reconstructed in a heterologous host, or chassis, through synthetic biology (green, right) for efficient production of target compounds (e.g. nutraceuticals, pharmaceuticals, and bio-based materials). Additionally, the upstream primary metabolic pathways can be engineered in the host to optimize the supply of a specific precursor(s) (blue, right). B, Besides microbial hosts, plants can provide alternative chassis to produce natural plant products in sustainable and potentially efficient manners, if their pros and cons (table) are carefully evaluated and addressed. See Appendix S1 for image credits. JBC REVIEWS: Harnessing plant primary metabolic diversity blocks of diverse isoprenoid compounds, such as sterols (e.g. cholesterols), dolichol, and quinones (e.g. ubiquinone). In plants, IPP and DMAPP are also used to synthesize photosynthetic pigments (i.e. chlorophylls and carotenoids) and quinones (i.e. plastoquinone and phylloquinone), plant hormones (e.g. gibberellins, brassinosteroids, and abscisic acid), rubbers, isoprene, mono-and sesquiterpene volatiles, and diverse di-and tri-terpenoids (50 -54). IPP (and DMAPP) can be synthesized via two different routes, the mevalonate (MVA) and 2-Cmethyl-D-erythritol 4-phosphate (MEP) pathways ( Fig. 2) (36, 50,55). Most organisms have either one of the two pathways: for example, the MVA pathway is present in animals, fungi, and archaea, and the MEP pathway is found in many bacteria, including Escherichia coli and cyanobacteria (56 -58). Notably, however, plants and many algae have both MVA and MEP pathways to synthesize IPP and DMAPP, which support the formation of these diverse isoprenoid compounds in different subcellular compartments (Fig. 2). These two pathways appear to have some but limited metabolic cross-talks (55,(59)(60)(61)(62)(63). Although various isoprenoids, including the plant-derived sesquiterpene artemisinin, have been successfully produced through microbial synthetic biology (12)(13)(14)64), the natural capacity of plants to produce abundant IPP can also be utilized for production of various isoprenoid compounds using plant hosts (36, [65][66][67]  Plants have two alternative pathways to synthesize IPP precursor for production of diverse terpenoid compounds. The MVA pathway occurs outside of the plastids and provides IPP and DMAPP precursors for downstream specialized metabolism to synthesize diverse sterols, sesquiterpenes, and triterpenes, for example. The MEP pathway is localized in the plastids and supports biosynthesis of isoprene and monoterpene volatiles, various diterpenes, and photosynthetic isoprenoids (e.g. chlorophylls and plastoquinone). The compartment in light blue depicts ER. Although it is not shown here, some of the MVA pathway enzymes, PMK, MPDC, and IDI, appear to be localized in the peroxisome, in addition to the cytosol. The alternative MVA pathway enzymes of archaea is shown in gray. MVPP, mevalonate-5-diphosphate. Enzymes abbreviated in boxes include: Fd, ferredoxin; MK, MVA kinase; Nudix, Nudix hydrolase. See the footnotes for other abbreviations introduced in the text. JBC REVIEWS: Harnessing plant primary metabolic diversity efficient production of isoprenoid compounds in plants (68 -74), and it is critical to understand how plants regulate IPP and DMAPP production through the MVA and MEP pathways.
Recent studies revealed further complexity of the plant MVA pathway and its regulation. The last two steps of the MVA pathway appear to be flipped in archaea and Chloroflexi bacteria: mevalonate 5-phosphate (MVP) is converted by phosphomevalonate decarboxylase (PMD) to isopentenyl phosphate (IP), which is then further phosphorylated to IPP by ATP-dependent isopentenyl phosphate kinases (gray in Fig. 2) (IPKs) (87)(88)(89)(90). All sequenced plant genomes also encode the IPK enzymes, which can phosphorylate both IP and dimethylallyl phosphate (DMAP) to IPP and DMAPP, respectively (88,91). Unlike archaea, however, plants appear to lack the PMD orthologs and instead produce IP and DMAP by Nudix hydrolases through dephosphorylation of IPP and DMAPP, respectively ( Fig. 2) (92). Further genetic studies demonstrated that reducing the formation of IP and DMAP by either down-regulating Nudix hydroxylase or up-regulating IPK led to elevated accumulation of both sesquiterpenes and monoterpenes produced in the cytosol and plastids, respectively (91,92). These results suggest that IP and DMAP negatively regulate terpenoid production in plants. Therefore, the reactivation of IP and DMAP through phosphorylation provides a promising approach to enhance terpenoid productions in plants, especially when combined with up-regulation of other rate-limiting enzymes of the MVA pathway, such as HMGR and PMK (92).
The alternative MEP pathway takes place in the plastids and starts from the thiamine diphosphate-dependent condensation of glyceraldehyde 3-phosphate and pyruvate to the 1-deoxy-D-xylulose 5-phosphate (DXP), which is then reductively isomerized to MEP (Fig. 2). MEP is activated by coupling to cytidine triphosphate (CTP) and ATP-dependent phosphorylation, followed by cyclization to 2-C-methyl-D-erythritol-2,4cyclodiphosphate (MEcPP). MEcPP undergoes ring opening and reductive dehydration to 4-hydroxy-3-methyl-butenyl 1-diphosphate (HMBPP), which is then converted to both IPP and DMAPP (Fig. 2) (55,70). Given that one of the MEP precursors, glyceraldehyde 3-phosphate, is the primary product of the Calvin-Benson cycle, the plastidic MEP pathway likely provides a robust IPP precursor supply for synthesis of abundant photosynthetic isoprenoids, including chlorophylls, carotenoids, and prenylquinones, as well as isoprene, which can account for 99% of de novo synthesized isoprenoids in poplar leaves (93). Indeed, stable isotope-labeled 13 CO 2 is rapidly incorporated into the intermediates of the MEP but not MVA pathway in illuminated Arabidopsis leaves (94). The last two enzymes, HMBPP synthase (HDS) and reductase (HDR), are iron-sulfur cluster proteins and can accept electrons directly from ferredoxin, the final donor of the photosynthetic electron transport chain, under light (95)(96)(97). This likely provides an additional mechanism of coordination between photosynthesis and the MEP pathway in the chloroplasts (Fig. 2).
One might speculate that the plastidic MEP pathway of plants and algae is derived from endosymbiosis of cyanobacteria, which also synthesize IPP and DMAPP by the MEP pathway. However, evolutionary analyses of individual MEP pathway enzymes of plants and algae revealed that these enzymes have mosaic evolutionary origins and share last common ancestors with either cyanobacteria, ␣-proteobacteria, or Chlamydia; some of these genes were horizontally transferred to a common ancestor of plastid-bearing eukaryotes (57,58). Because of its complex evolutionary history and the high and diverse demand for synthesizing numerous and abundant isoprenoid compounds, the plant MEP pathway is likely regulated differently from that of bacteria. The initial reaction, catalyzed by DXP synthase (DXS), is irreversible and commits carbon to the MEP pathway. The DXS enzyme hence plays the major role in controlling the flux through the MEP pathway, with a flux control coefficient of 0.82 in Arabidopsis leaves-the coefficient of 0 or 1 indicates that an individual enzyme (i.e. DXS) exerts no control or complete control, respectively, over the flux through an entire pathway (i.e. the MEP pathway) (94). However, DXS overexpression had only a modest increase in isoprenoid accumulation, partly due to the export of the downstream MEcPP intermediate to a nonplastidic pool ( Fig. 2) (94), which, interestingly, can participate in the plastid-to-nucleus retrograde signaling (98, 99). Also, DXS protein level and activity are regulated through the stromal protein quality control system mediated by concerted actions of Hsp chaperons and Clp proteases (100 -102). DXS from poplar is feedback-inhibited by IPP and DMAPP in a noncompetitive manner (74,93), which may set the upper limit of IPP and DMAPP accumulation in the plastids. Furthermore, like many other plastidic enzymes (e.g. glyceraldehyde 3-phosphate dehydrogenase and glutamine synthetase), the downstream enzymes, DXP reductase (DXR), HDS, and HDR are targets of thioredoxins and likely subjected to redox regulation (78, 103-105), although their physiological significance remains to be tested. Thus, modulating both transcriptional and post-transcriptional regulation, along with the MEcPP-mediated signaling pathway, will likely lead to enhanced supply of IPP and DMAPP in the plastids and increased production of MEP pathway-derived isoprenoid compounds in plants. It remains to be explored, however, whether some of these MVA and MEP pathway regulations are different in certain plant lineages. Such variations in this key plant metabolic branch, if any, can provide useful tools to further improve IPP and/or DMAPP supply and downstream terpenoid production.

Alternative phenylalanine biosynthetic pathways for phenolic compound production in plants
L-Phenylalanine (Phe) is an aromatic amino acid required for protein synthesis in all organisms and is produced in microbes and plants but not in animals (106 -109). Plants also use Phe as the precursor to synthesize various phenolic natural products, including diverse and abundant phenylpropanoids, such as lignin, lignans, condensed tannin, flavonoids, anthocyanin pigments, coumarins, stilbenes, and more (110, 111). Some of these phenolic compounds likely played critical roles during plant evolution, such as UV-absorbing phenolic compounds (e.g. sinapoyl derivatives), lignin, and sporopollenin during the evolution of land, vascular, and seed plants, respectively (112). A defense hormone salicylic acid and an electron carrier ubiquinone can be also synthesized from Phe in plants (113)(114)(115)(116). Significantly, up to 30% of total deposited carbon in plants can be directed toward Phe biosynthesis in vascular plants for the production of lignin and tannin (117,118). Thus, most plants have inherent capacity to produce a large quantity of phenolic natural products, and it is important to understand biochemical and genetic mechanisms underlying and controlling the production of the Phe precursor. Although efforts have been made to reduce content or modify composition of lignin, which impedes bio-ethanol production by microbial fermentation of cellulosic plant biomass (119 -121), increased synthesis of Phe will enable production of a variety of Phe-derived natural products and other phenolic compounds (17,18,24,110).
Phe biosynthesis starts from the shikimate pathway, which converts erythrose 4-phosphate and phosphoenolpyruvate, derived from the pentose phosphate pathways and glycolysis, respectively, into chorismate, the last common precursor of all three aromatic amino acids-Phe, L-tyrosine, and L-tryptophan (Fig. 3) (106, 107). Although plants and microbes have a very similar tryptophan biosynthetic pathway (122,123), plants have different biosynthetic routes for Phe and tyrosine from most microbes. In model microbes such as E. coli and yeast, chorismate is converted by chorismate mutase (CM) into prephenate, which undergoes dehydration or NAD(P) ϩ -dependent oxidative decarboxylation into phenylpyruvate or 4-hydroxyphenylpyruvate, followed by transamination into Phe or tyrosine, respectively (gray pathways in Fig. 3) (108). In most plants, Phe and tyrosine biosynthesis predominantly proceeds via a different, nonproteogenic amino acid intermediate, L-arogenate, in the plastids. In the arogenate pathway, prephenate is first transaminated to arogenate (124 -127), which then undergoes dehydration or NADP ϩ -dependent oxidative decarboxylation into Phe and tyrosine, respectively (Fig. 3) (128 -132).
Some cyanobacteria also have the arogenate Phe and tyrosine biosynthetic pathways (133-136); however, the plant pathways are not simply derived from cyanobacteria endosymbiosis, but are likely acquired through horizontal gene transfer from other bacterial lineages (137, 138). Prephenate aminotransferase (PPA-AT), which directs carbon flux toward the arogenate Phe and tyrosine pathways (Fig. 3) (126,127,139), evolved convergently in different microbial lineages from at least three distinct transaminase classes: Ib aspartate aminotransferase (e.g. in Chlorobi/Bacteroidetes, ␣-proteobacteria); N-succinyl-L,L-diaminopimelate aminotransferase (e.g. in actinobacteria); and branched-chain aminotransferase (e.g. in cyanobacteria) (135,137,140). Notably, plant PPA-ATs are most closely related to the Ib aspartate aminotransferase-type of Chlorobi/Bacteroidetes (135,137). Arogenate dehydratase (ADT) and dehydrogenase (TyrA a ) enzymes catalyze subsequent reactions of PPA-AT and produce Phe and tyrosine, respectively, from arogenate (Fig. 3). Although model microbes, such as E. coli and yeast, only have prephenate dehydratase (PDT) and dehydrogenase (TyrA p ), some bacteria have ADT and TyrA a enzymes, which likely evolved through enzyme neofunctionalization of PDT and TyrA p , respectively, and switch in their substrate specificity from prephenate to arogenate (136,138,(141)(142)(143)(144)(145)(146). Interestingly, all known plant ADTs are most closely related to those of Chlorobi/Bacteroidetes (137), suggesting that both PPA-AT and ADT enzymes required for the arogenate Phe pathway were transferred from Chlorobi/Bacteroidetes to the common ancestor of green algae and land plants. For tyrosine biosynthesis, plant TyrA a enzymes are most closely related to TyrA a enzymes of Spirochaetes and ␦-proteobacteria (145), suggesting that yet another horizontal gene transfer contributed to the formation of the arogenate tyrosine biosynthetic pathway in plants.
More recent studies further revealed that some plants have an additional microbial-like Phe biosynthetic pathway that operates in the cytosol (Fig. 3) (147,148), which might have provided robust production and homeostasis of Phe and diverse plant natural products derived from Phe. It has been known for a long time that many plants have both plastidic and cytosolic CM enzymes, the latter are not feedback-regulated by AAAs (149 -151). However, in planta functions of the cytosolic isoforms had been enigmatic. Genetic down-regulation of the cytosolic CM gene in petunia flowers and wounded Arabidopsis leaves led to reduced production of Phe-derived compounds, e.g. phenylacetaldehyde. The cytosolic prephenate is further converted to phenylpyruvate by a partial PDT activity of some ADT isoforms having dual localization to the plastids and cytosol due to an alternative transcription start site (148). Phenylpyruvate is then transaminated to Phe via cytosolic phenylpyruvate aminotransferase (PPY-AT) (Fig. 3) (147). The cytosolic CM orthologs are present in all angiosperms, but appear to be absent in gymnosperms, ferns, mosses, and Amborella trichopoda, an early diverged flowering plant (148,150,152). Because plastidic ADT isoforms having PDT activity were found in Pinus pinaster (144), a part of the alternative phenylpyruvate Phe pathway may take place also in the plastids in nonflowering plants (153). Thus, some variations exist in the phenylpyruvate Phe pathway at least for its enzyme subcellular localization among different plant groups. Future studies can explore potential variations of the Phe biosynthetic pathways among different plant lin-JBC REVIEWS: Harnessing plant primary metabolic diversity eages in both the arogenate and phenylpyruvate routes. Such variations, if any, will not only advance our understanding of the evolutionary history of this highly-active amino acid pathway in plants, but also provide useful tools to further optimize the supply of Phe precursor and the production of various phenolic compounds in plants.

Recent and lineage-specific diversification of primary metabolism within the plant kingdom
Besides the above ancient diversification of primary metabolism in the ancestor of Plantae, more recent diversifications of primary metabolism have been reported in specific lineages within the plant kingdom.

Diversification of the tyrosine biosynthetic pathways and their regulation
Besides serving as a protein building block, L-tyrosine is utilized in plants to synthesize various natural products, such as tocopherols, plastoquinone, betalain pigments, cyanogenic glycosides, catecholamines, and various alkaloids (152). Unlike Phe-derived phenylpropanoids (e.g. lignin and flavonoids), these tyrosine-derived plant natural products are typically produced in specific plant lineages (152), with the exceptions of tocopherols and plastoquinone ubiquitously found in plants and other photosynthetic microbes (154 -157). Also, in most plants, tyrosine biosynthesis is less active than Phe biosynthesis and is strictly feedback-inhibited by tyrosine at the TyrA a Figure 3. Evolutionary diversification of the aromatic amino acid biosynthetic pathways in plants. These aromatic amino acids, L-phenylalanine (Phe), L-tyrosine, and L-tryptophan, are required for protein synthesis in all organisms, but they are also used to synthesize diverse natural products (green) in plants. Plants synthesize Phe and tyrosine predominantly via the arogenate intermediate, unlike many microbes that make them via phenylpyruvate and 4-hydroxyphenylpyruvate intermediates, respectively (gray). Plants have an additional pathway to synthesize Phe in the cytosol. In certain plant lineages, the tyrosine and tryptophan pathways and their regulation have diversified: arogenate TyrA dehydrogenase (TyrA a ) and anthranilate synthase ɑ subunit (AS␣) are typically strongly feedback-inhibited by tyrosine and tryptophan, respectively (red lines); however, their lineage-specific noncanonical counterparts (blue) are not and provide abundant tyrosine or anthranilate precursors for synthesis of downstream specialized metabolites (green). Dotted lines denote hypothesized but uncharacterized transport processes. Abbreviations: cCM, cytosolic chorismate mutase; pCM, plastidic CM; ncTyrA a , noncanonical TyrA a found in some dicots; TyrA a ␣, Caryophyllales-specific TyrA a . See the footnotes for other abbreviations introduced in the text. JBC REVIEWS: Harnessing plant primary metabolic diversity enzymes (Fig. 3) (107, 131, 132, 152, 158). A recent study, however, identified TyrA a enzymes having relaxed sensitivity to the tyrosine-mediated feedback inhibition in the plant order Caryophyllales (159). Some Caryophyllales species uniquely produce red to yellow betalain pigments that replaced more ubiquitous Phe-derived anthocyanin pigments (160 -162). Betalain-producing species, such as beets, quinoa, spinach, and cacti, have at least two copies of recently-duplicated TyrA a enzymes, TyrA a ␣ and TyrA a ␤. The TyrA a ␣ enzymes exhibit substantially reduced sensitivity to tyrosine inhibition with IC 50 values of Ͼ1 mM as compared with ϳ50 M of the other TyrA a ␤ copies (159) and typical TyrA a ␣ enzymes of plants (131,132,158). Some Caryophyllales lineages, such as the Caryophyllaceae family that includes carnation, reverted back to anthocyanin pigmentation (162,163) and also down-regulated or lost the TyrA a ␣ gene (159). Further evolutionary analyses utilizing transcriptome data of over 100 Caryophyllales species, combined with their TyrA enzyme characterization, revealed that the de-regulated TyrA a ␣ evolved prior to the emergence of betalain pigmentation (159). The results suggest that a lineagespecific de-regulation of tyrosine biosynthesis contributed to the evolution of a downstream natural product pathway, betalain biosynthesis. The finding also suggests that enhanced supply of the tyrosine precursor is important for efficient production of tyrosine-derived natural products in plants.
A further diversification of the tyrosine biosynthetic pathway was found in other plant lineages. Earlier biochemical studies detected microbial-like prephenate-specific TyrA p activity in some plants, all belonging to the legume family (164 -166). More recently, the genes and enzymes responsible for the TyrA p activity were identified from soybean and Medicago and found to be highly specific to prephenate than arogenate substrate (k cat /K m of 100 -200 versus 0.05-0.5 mM Ϫ1 s Ϫ1 ) (142,145). Unlike TyrA a enzymes (167), these legume TyrA p enzymes are localized outside of the plastids and are completely insensitive to tyrosine feedback inhibition (Fig. 3) (142). The phylogenetic analyses of these legume TyrA p genes as well as other plant and microbial TyrA genes revealed that legume TyrA p orthologs were derived from two events of gene duplication within the plant kingdom, with the recent one specifically occurring in the legume family and giving rise to the prephenate-specific TyrA p (142). Interestingly, the first duplication event at the base of angiosperms (flowering plants) led to noncanonical TyrA a enzymes that are found in legumes and some other eudicots, but not in all plants. This third type of plant TyrA enzymes prefers arogenate substrate (and thus noncanonical, ncTyrA a , Fig. 3), is partially insensitive to tyrosine inhibition, and is likely localized outside of the plastids, judging from the lack of a plastid transit peptide (145). Thus, legumes have at least three pathways to synthesize tyrosine. Additionally, a fourth pathway of tyrosine biosynthesis exists in lycophytes (mosses) and gymnosperms, which have Phe hydroxylase (PheH) enzymes that are localized in the plastids and can convert Phe into tyrosine in a 10-formyltetrahydrofolate-dependent manner (Fig. 3) (168). Although the physiological functions of these alternative tyrosine biosynthetic pathways are largely unknown, the PheH genes are up-regulated together with tyrosine degradation pathway genes under drought stress.
Thus, PheH may allow catabolism of Phe via tyrosine in nonflowering plants (169). Also, genes encoding the tyrosine-insensitive TyrA p enzymes were found to be highly expressed in several Inga species, tropical legume trees, that accumulate extremely high levels of tyrosine and/or tyrosine-derived natural products (e.g. tyrosine-gallate conjugates) at Ͼ10% of dry weight (170,171). Therefore, these lineage-specific alternative tyrosine biosynthetic pathways and their regulation likely play important roles in the production and evolution of downstream specialized metabolites in plants.

Lineage-specific de-regulation of anthranilate biosynthesis
The tryptophan branch of the aromatic amino acid pathways also provides precursors to synthesize various plant-specialized metabolites, such as tryptophan-derived indole alkaloids and glucosinolates (172)(173)(174)(175), anthranilate-derived anthranilamide phytoalexins (176), and indole-derived benzoxazinones (Fig. 3) (177,178). Some species of the Rutaceae family produce anthranilate-derived acridone and furoquinoline alkaloids, some of which have antimicrobial activities and are strongly induced upon elicitor treatment (179). In Ruta graveolens, the induction of acridone alkaloid accumulation correlates with increased activity of anthranilate synthase (AS) (180), which catalyzes the first step of tryptophan biosynthesis and converts chorismate into anthranilate (Fig. 3) (181). AS is composed of two distinct subunits, AS␣ and AS␤, the former is usually strictly regulated by the pathway product, tryptophan (181)(182)(183). It was found that R. graveolens has two AS␣ copies, one of which is induced under pathogen infection and is not inhibited by tryptophan, whereas the other copy is noninducible and inhibited by tryptophan (184,185). Thus, the lineage-specific duplication and neofunctionalization gave rise to the inducible and feedbackinsensitive AS␣ enzyme, which diverts carbon flow away from tryptophan biosynthesis and provides the anthranilate precursor for the formation of acridone alkaloids in this plant (Fig. 3). Furthermore, the distinct temporal and possibly spatial expression patterns of AS␣1 and AS␣2 (184,185) likely allow fine regulation of carbon allocation between biosynthesis of tryptophan-and anthranilate-derived plant natural products. Although the phylogenetic distribution and evolutionary history of the feedback-insensitive AS␣ enzyme are currently unknown, the emergence of the lineage-specific AS␣ likely provided a unique opportunity in some Rutaceae lineages to produce anthranilate-derived plant natural products.

Impacts of altered branched-chain amino acid biosynthesis on acylsugar specialized metabolism
Branched-chain amino acid biosynthesis has been also altered in a specific plant lineage, which impacted its downstream specialized metabolic pathways. Isopropylmalate synthase (IPMS) catalyzes the committed reaction of L-leucine biosynthesis, the conversion of 3-methyl-2-oxobutanoate (3MOB) into 2-isopropylmalate ( Fig. 4) (186). Because 3MOB is also used for L-valine biosynthesis, IPMS is usually feedback inhibited by leucine, controlling carbon allocation between the leucine and valine biosynthetic pathways (187,188). Interestingly, the IPMS3 isoform has been altered in wild and cultivated tomatoes, Solanum pennellii and Solanum lycopersicum, JBC REVIEWS: Harnessing plant primary metabolic diversity respectively, at its C-terminal regulatory domain. The IPMS3 isoform of S. lycopersicum is truncated and hence insensitive to leucine-mediated feedback inhibition (green, Fig. 4), whereas that of S. pennellii is further truncated into its catalytic domain and has lost its enzyme activity (blue, Fig. 4) (189). As a result, more carbon flows toward leucine and valine biosynthesis in S. pennellii and S. lycopersicum, respectively. Notably, the changes in leucine and valine biosynthesis at IPMS3 likely underlie the structural differences in their acylsugar-specialized metabolites (189), which accumulate in the glandular trichomes of Solanaceae plants as insecticides (190). The acylsugars of S. pennellii and S. lycopersicum have 2-methylpropanoic and 3-methylbutanoic acid (iC4 and iC5) acyl chains, which are derived from the corresponding branched-chain keto acids of valine and leucine, 3MOB and 4-methyl-2-oxopropanoate (4MOP), respectively (Fig. 4). These examples highlight the role of primary metabolite precursor supply in the formation and potentially the evolution of their downstream specialized metabolites in specific plant lineages.

Genetic and molecular basis of primary metabolic diversity
With the advent of genome editing, the identification of alleles (mutations) underlying key metabolic innovations (e.g. primary metabolic diversification) is critical for introducing a specific genetic modification(s) for rational and precise metabolic engineering. Thus, we now have a strong rationale to go beyond gene discovery and conduct structure-function analyses of encoded enzymes to identify key amino acid residues and mutations. This is particularly crucial when we try to manipulate plant primary metabolism, which is highly sensitive to genetic modification due to its tight integration with complex metabolic networks and plant growth and physiology.

Phylogeny-guided structure-function analysis to identify mutations underlying key evolutionary innovations of plant metabolism
Comparative analyses of enzyme variants from different plant species and accessions have identified causal mutations responsible for unique biochemical properties (e.g. substrate and product specificities) that evolved in certain plant lineages (191)(192)(193)(194). A rapidly increasing number of genome and transcriptome sequences (195)(196)(197) is further enabling "phylogenyguided" structure-function analyses, which determine and utilize evolutionary transitions (i.e. gain and loss) of a lineagespecific enzyme property (198). Two groups of closely-related protein sequences but with distinct biochemical characteristics can be compared to identify residues that are conserved only in one group. The key is to utilize a large number of genome/ transcriptome sequences and determine precise phylogenetic boundaries for the presence and absence of a certain biochemical property to pinpoint responsible residues. Based on a protein crystal structure or a structure model, these candidate residues can be further prioritized for validation by site-directed mutagenesis followed by biochemical analyses. This approach not only reduces the number of sites for mutagenesis but also informs which particular amino acid to mutate to, out of 19 amino acids. This method was recently employed to uncover metabolic enzyme diversification underlying chemical diversity of acylsugar-specialized metabolites, among the closely-related species of the Solanum genus and the Solanaceae family (198 -200). A similar approach has been also utilized to uncover the genetic basis of primary metabolic diversity in plants, as described below. Isopropylmalate synthase (IPMS) catalyzes the committed step of L-leucine biosynthesis and is typically feedback-inhibited by leucine (red line). S. lycopersicum (cultivated tomato) has an IPMS3 enzyme (SlIPMS3) that is truncated at its C-terminal regulatory domain and thus insensitive to leucine, leading to active synthesis of 3-methylbutanoate (iC5)-acylsugar chains derived from the keto acid of leucine, 4-methyl-2-oxopentanoate (4MOP, green). In contrast, the IPMS3 enzyme of S. pennellii (wild tomato) is further truncated into the catalytic domain and lacks its activity, leading to active synthesis of 2-methylpropanoate (iC4)-acylsugar chains derived from the keto acid of valine, 3-methyl-2-oxobutanoate (3MOB, blue).

Molecular basis of the evolution of plant prephenate aminotransferases and the arogenate Phe and tyrosine pathways
PPA-ATs catalyze the committed step of the arogenate pathway of Phe and tyrosine biosynthesis (Fig. 3) (124 -127, 139) and are found in plants and some microbes (135,137,140). Biochemical characterization of PPA-AT homologs from various plants and microbes determined the phylogenetic distribution of their functional orthologs that are capable of transaminating prephenate (137, 140). The peptide sequence comparison of closely-related aminotransferases with and without prephenate transamination activity identified two amino acid residues required for this activity. Mutating these two residues converted Arabidopsis PPA-AT to a general aromatic amino acid aminotransferase having broad substrate specificity (137). X-ray crystal structure analyses of plant and bacterial PPA-ATs further revealed the molecular basis of prephenate substrate recognition and identified two additional residues that further enhance prephenate specificity (140,201). Thus, these residues likely played key roles in the evolution of PPA-ATs that allow plants to synthesize Phe and tyrosine via the arogenate pathway.

Determinants of TyrA dehydrogenase substrate specificity and feedback regulation
Phylogenetic sampling of TyrA orthologs across the eudicots also identified key residues underlying the evolutionary transition and emergence of prephenate dehydrogenase (TyrA p ) from arogenate dehydrogenase (TyrA a ) within the legume family (145). Sequence comparisons of hundreds of protein sequences before and after the evolutionary transition from TyrA a to TyrA p identified a highly-conserved acidic aspartate residue that is responsible for the arogenate specificity and tyrosine sensitivity of TyrA a enzymes. Further crystal structure analyses demonstrated that the aspartate residue directly interacts with the side-chain amine that is present in arogenate and tyrosine but is absent in prephenate (Fig. 3). Furthermore, introducing the aspartate residue in a feedback-inhibited canonical TyrA a enzyme from Arabidopsis reduced arogenate substrate specificity and introduced prephenate dehydrogenase activity while simultaneously relaxing the tyrosine feedback inhibition (145). Thus, the identified residue can now be utilized to relax negative regulation of tyrosine biosynthesis in nonlegume plants and to enhance tyrosine supply and production of its downstream specialized metabolites. Of course, the situation may be more complex in other cases due to potential epistatic interactions between different amino acid residues. For example, introduction of a functional mutation(s) may not be sufficient to provide a desired biochemical property, if a background enzyme to be engineered either lacks a permissive mutation(s) or carries a constraining mutation(s), which is required for or prevents the functionality of the introduced mutation(s), respectively (202)(203)(204)(205)(206). Nevertheless, the phylogeny-guided structure-function analyses provide powerful tools to identify key evolutionary innovations and natural mutations underlying both primary and specialized metabolic diversification. The identified mutations can then be used to conduct targeted metabolic engineering to redesign specific metabolic traits, such as optimization of primary metabolite precursor supply, as discussed in the following section.

Harnessing primary metabolic diversity for building and optimizing plant chemical production platforms
The fundamental knowledge about the evolutionary diversification of primary metabolism in plants can be utilized to build plant chassis, or chemical production platforms, and to further optimize their primary metabolism for efficient production of certain classes of natural products. Aforementioned studies suggest that the precursor supply needs to be optimized for efficient production of specialized metabolites, such as ones derived from tyrosine and anthranilate, which typically accumulate at low concentrations in most plants. Indeed, simultaneous expression of the beet TyrA a and the downstream betalain biosynthetic enzymes in Nicotiana benthamiana transient expression system demonstrated that enhanced supply of the tyrosine precursor increases the production of betalains derived from tyrosine (207). Even for synthesis of terpenoid and phenylpropanoid compounds that are supported by the dual pathways of IPP and Phe biosynthesis, respectively, in plants, coordinated up-regulation of upstream primary metabolism ("push") and downstream natural product pathways ("pull") appears to be important (67,(208)(209)(210). For example, the expression of AtMYB12, which activates the pentose phosphate, shikimate, and Phe pathways, in the tomato background expressing Delila and Rosea 1 transcription factors that activate anthocyanin biosynthesis led to a further increase in anthocyanin accumulation (208).
Some microbial enzymes, which are often not subjected to regulation in plants, were introduced into plants to enhance accumulation of some primary metabolites, such as amino acids (39, 211, 212). However, drastic alterations in primary metabolism often negatively impact plant growth and development, especially in vegetative tissues where many developmental processes are still taking place (39 -44). For example, expression of completely tyrosine-insensitive bacterial TyrA a or TyrA p enzyme in Arabidopsis severely compromised plant growth (212, 213). One way to overcome this issue is to use tissue-specific promoters, which led to many successful cases of metabolic engineering in seeds and fruits (208,211,(214)(215)(216)(217). However, it is also important to explore the possibility to utilize photosynthetically-active tissues for industrial scale production. These vegetative tissues comprise the majority of plant biomass, especially in perennial grasses, and have plentiful reducing energy and organic carbons that are required for anabolic pathways, such as natural product biosynthesis. Because natural variants of plant enzymes evolved in the context of plant metabolism over a long period of time, their identification can provide useful tools to optimize plant primary metabolism without severely compromising overall plant metabolism and growth (Fig. 5a). Heterologous expression of the partially-deregulated TyrA a enzymes from beets indeed enhanced the production of tyrosine while still maintaining growth in Arabidopsis (213). Moreover, specific mutations underlying unique alterations in primary metabolic enzyme properties, identified through the aforementioned phylogeny-guided biochemical JBC REVIEWS: Harnessing plant primary metabolic diversity approach, can also be introduced to corresponding endogenous genes of host plants (Fig. 5b). A precise genome editing of a specific nucleotide base, such as by base editor (218,219), enables alteration of a specific biochemical trait(s) without using transgenic approaches.
Traditionally, microbial metabolic engineering and synthetic biology have been conducted using model organisms that are easy to manipulate in the laboratory, such as E. coli and Saccharomyces cerevisiae. More recently, attempts are being made to identify other chassis organisms that may be better suited for production of certain compounds at industrial applications, such as Bacillus subtilis, Corynebacterium glutamicum, and Pseudomonas putida (220 -225). Selecting and starting from naturally "tailored" organisms will be even more crucial in plants (Fig. 5c), because the scale of genetic manipulations, either through transgenic approach, gene editing, or mutagenesis screening, is much more limited in plants than in microbes. For example, the production of tyrosine-derived natural products (e.g. isoquinoline alkaloids and betalain pigments) may be better achieved in plants already having feedback-insensitive or less-sensitive TyrA enzymes, such as legumes and Caryophyllales, respectively (142,159). These plants not only have high availability of a primary metabolite precursor (e.g. tyrosine) but likely have tailored many other processes during the evolution (e.g. adjustment of competing pathways and growth) to accommodate certain changes in primary metabolism. Further exploration of primary metabolic diversification will thus help identify candidate host plants (Fig. 5c), on which a certain downstream natural product pathway can be reconstructed with precisely-targeted engineering of upstream primary metabolism (Fig. 5, a and b).

Exploring other instances of primary metabolic diversification in the plant kingdom
One broader question that remained to be answered is how widespread are the incidences of diversification of plant primary metabolic pathways, beyond that observed and discussed here. Although plants already have dual pathways to synthesize IPP and Phe, further exploration of potential natural variations in their pathway architecture, regulation, and enzymes will likely be fruitful in selecting ideal plant hosts and optimizing the supply of the IPP or Phe precursor for efficient production of terpenoid or phenolic compounds in plants.
For other metabolic branches, where and which pathways should we investigate next to identify other potential examples of primary metabolic diversity? One approach is to identify a plant species that accumulates extremely-high concentrations (e.g. over 5% of dry weight) of certain natural products and to find a key gene(s)/mutation(s) responsible for the unusual accumulation of certain compounds. It was recently found that several Inga species of the legume family accumulate tyrosinederivatives at 5-20% of dry weight and have elevated expression of a gene-encoding tyrosine-insensitive TyrA p enzyme (171). Although this particular study was facilitated by prior knowledge of the presence of de-regulated TyrA p enzymes in legumes (142,145), we can now identify the underlying genetic basis JBC REVIEWS: Harnessing plant primary metabolic diversity using various approaches. First, we can conduct targeted analyses, such as comparative expression and biochemical analyses, on upstream biochemical steps, which are known to be highly regulated (e.g. feedback-inhibited) or located at a metabolic branch point with other pathways. Second, we may be able to also take more unbiased approaches, such as genome-wide association analysis (226,227), especially if natural populations with varied levels of a certain compound (or a certain class of compounds) can be identified. Rapidly decreasing costs of transcriptome and genome sequencing now allow both of these targeted and untargeted approaches even in nonmodel plant species.
Another approach to more broadly identify potential evolutionary diversification of primary metabolic enzymes is to utilize the wealth of publicly-available transcriptome and genome sequences of diverse plants (195,197). Recent studies have predicted specialized versus primary metabolic enzymes in multiple plant genomes based on analyses of transcript co-expression and gene co-occurrence on genomes (e.g. gene clusters), and further by considering additional features (e.g. evolutionary and gene duplication properties) using machine learning (228 -230). Some false-positives that are predicted as specialized metabolic genes/enzymes in these analyses but traditionally annotated as primary metabolic enzymes can be interesting targets for further exploration. These genes/enzymes might have neofunctionalized, likely after gene duplication, to acquire unique biochemical properties, like de-regulated IPMS or AS (184,231). However, they may indeed function as a specialized metabolic enzyme that is recruited from a primary metabolic enzyme (45), like methylthioalkylmalate synthase in glucosinolate biosynthesis originally derived from IPMS (187). Therefore, empirical examinations of candidate genes/enzymes through biochemical and genetic characterization, in collaboration with computational biologists, will be critical to identify relatively rare but key evolutionary innovations in primary metabolic diversity.

Conclusions and future perspectives
Plant-based production of plant natural products and other biomaterials at a commercial scale may be too difficult to achieve in a short time frame (Ͻ5 years); however, some of the complex natural products (e.g. morphine alkaloids and artemisinin) accumulate at high levels and are being produced commercially in plants, thanks to years of cultivation and breeding (232)(233)(234). Also, some plants can naturally accumulate certain natural products at extremely high concentrations (over 5% dry weight) (170,(235)(236)(237). Thus, plant-based chemical production is possible. We have to come up with strategies to overcome challenges and quickly find and redesign ideal plant hosts (without hundred years of breeding), so that plants can provide alternative resources and platforms to produce various chemicals in the near future. Rapidly growing numbers of plant genome and transcriptome data are facilitating our efforts to identify both specialized and primary metabolic enzymes and pathways uniquely present in specific plant lineages. Besides identifying novel specialized metabolic pathways from various plants and introducing them into a host plant (Fig. 5, green), here I emphasize the importance of optimizing plant primary metabolic pathways that provide precursors to the formation of downstream natural products and other target chemicals (Fig.  5, blue). One strategy to overcome this major challenge is to harness primary metabolic diversity that evolved in certain plant lineages. This can be envisioned in three ways: (a) identifying and introducing natural enzyme variants of plant primary metabolic enzymes; (b) determining and introducing underlying mutations of the natural enzyme variations in endogenous genes of host plants; and/or (c) finding and utilizing naturally "tailored" plant hosts for synthesizing a certain class of compounds (Fig. 5). Of course, these strategies should be best combined with prior knowledge and advanced technologies of transcriptional regulations, including tissue-specific or inducible expression systems as well as modular assembly of standardized DNA parts (238,239). It will be also important to couple with improved metabolic sinks (e.g. vacuole transport and storage) (240) and down-regulation of competing and catabolic pathways. Unlike the exploration of unknown specialized metabolic pathways, that of primary metabolism may not lead to novel gene and enzyme discoveries. However, the identification of relatively rare but key alternations in plant primary metabolism, especially at the interface with specialized metabolism, will provide critical information for the rational selection of plant chassis and for further optimization of the primary metabolic pathway (Fig. 5). Uncovering primary metabolic diversity in different plant lineages thus holds a key to achieve sustainable and sizable production of natural and bio-based products in plants.