Identification of a Flavonol 7-O-Rhamnosyltransferase Gene Determining Flavonoid Pattern in Arabidopsis by Transcriptome Coexpression Analysis and Reverse Genetics*♦

Glycosylation plays a major role in the remarkable chemical diversity of flavonoids in plants including Arabidopsis thaliana. The wide diversity encoded by the large family-1 glycosyltransferase (UGT) gene family makes it difficult to determine the biochemical function of each gene solely from its primary sequence. Here we used transcriptome coexpression analysis combined with a reverse genetics approach to identify a gene that is prominent in determining the flavonoid composition of Arabidopsis. Using transcriptome coexpression analysis accessible on the ATTED-II public data base, the expression pattern of a UGT gene, UGT89C1, was found to be highly correlated with known flavonoid biosynthetic genes. No C-7 rhamnosylated flavonols were detected in either of two T-DNA ugt89c1 mutants. This specific metabolite deficiency in the mutants was complemented by stable transformation with the genomic fragment containing intact UGT89C1. Glutathione S-transferasefused recombinant UGT89C1 protein converted kaempferol 3-O-glucoside to kaempferol 3-O-glucoside-7-O-rhamnoside and recognized 3-O-glycosylated flavonols and UDP-rhamnose as substrates, but not flavonol aglycones, 3-O-glycosylated anthocyanins or other UDP-sugars. These results show that UGT89C1 is a flavonol 7-O-rhamnosyltransferase. The abundance of UGT89C1 transcripts in floral buds was consistent with the flavonoid accumulation of C-7 rhamnosylated flavonols in Arabidopsis organs. Our present study demonstrates that the integration of transcriptome coexpression analysis with a reverse genetic approach is a versatile tool for understanding a multigene family of a metabolic pathway in Arabidopsis.

that five unknown UGT genes are involved in anthocyanin and/or flavonol glycosylation in Arabidopsis in addition to the previously identified four UGTs (flavonoid 3GlcT, flavonol 3RhaT, anthocyanin 5GlcT and flavonol 7GlcT). A majority of Arabidopsis flavonols (i.e. seven of eight flavonols whose structures have been determined) are 7-O-rhamnosylated (13)(14)(15)(16). It is apparent that a flavonol 7-rhamnosyltranserase (7RhaT) is a prominent enzyme that is largely responsible for the determination of flavonoid patterns in Arabidopsis. The flavonol 7-Orhamnosides are not limited to Arabidopsis but occur in a number of plant species (17). Of the 1,331 known flavonol O-glycosides, 140 are 7-O-rhamnosides (10.5%) (17). Of the 1,661 flavonol O-glycosides registered in the KNApSAcK data base, A Comprehensive Species-Metabolite Relationship Database (18), flavonol 7-rhamnosides account for 11.6%. 4 However, no gene coding for 7RhaT has been identified despite its importance in flavonoid biosynthesis in Arabidopsis and other plant species. The huge diversity of UGTs makes it difficult to define their biochemical functions solely by their primary structures as a simple phylogenetic relationship based upon primary structure can give ambiguous and often misleading predictions of biochemical function. Therefore, a clearer understanding of the flavonoid modifying enzymes may be gained by developing of novel criteria by genome-based information.
Recently, the integration of transcriptome and metabolome analyses has emerged as a promising technology for functional genomics (12,19,20). In addition, publicly available Arabidopsis transcriptome databases have been markedly improved and expanded. Notably, The Arabidopsis Information Resource (TAIR) (21) and the Nottingham Arabidopsis Stock Centre Arrays (NASCArrays) (22) have become available as Arabidopsis-specific repositories for microarray data. Furthermore, a number of secondary databases based on microarray data are accessible on-line: ATTED-II, a data base for co-regulated gene relationships in Arabidopsis thaliana to estimate gene functions (23); PRIMe (Platform for RIKEN Metabolomics), a data base for integrated analysis of transcriptomics and metabolomics; a co-response data base in the comprehensive system-biology data base (CSB.DB); and Genevestigator, an A. thaliana microarray data base and analysis toolbox. Utilization of these public databases may greatly help to identify genes of interest for particular biochemical functions.
In the present study, we identified at least one flavonoid UGT gene from among 107 candidates using metabolomics, transcriptomics, and reverse genetics as a first step to complete the model flavonoid pathway in Arabidopsis. By coexpression analysis of transcriptome data sets of TAIR, we found that the expression of a UGT gene, UGT89C1, is highly correlated with the expression of known flavonoid biosynthetic genes. Flavonoid profiling in ugt89c1 knock-out mutants and a recombinant protein assay confirmed that UGT89C1 catalyzes rhamnosylation at the C-7 position of flavonols. Our results indicate that a comprehensive strategy combining coexpression analysis, metabolic profiling, reverse genetics, and biochemistry can be a versatile tool for the functional identification of genes that belong to a multigene family and to complete the model of a particular metabolic pathway in Arabidopsis.

EXPERIMENTAL PROCEDURES
Plant Materials-A. thaliana plants (ecotype Columbia-0) were grown on germination medium (24) at 22°C under 16-h/ 8-h light and dark cycles. The light intensity was 40 mol of photons m Ϫ2 s Ϫ1 . T-DNA insertion lines for UGT89C1 were obtained from the SALK Institute and were screened by PCR using specific primers for T-DNA and UGT89C1, UGT89C1f, UGT89C1r, LBa1, and RBa1 (supplemental Table 1). PCR products were sequenced to determine the exact insertion points.
Chemicals-Chemicals of the highest grade commercially available were used unless specifically noted. Anthocyanin and flavonol standards were purchased from Extrasynthese and Analyticon. Kaempferol 3-O-␣-L-rhamnoside and kaempferol 3,7-O-bis-␣-L-rhamnoside were kindly provided by Professor H. Takayama, Chiba University, Japan. UDP-rhamnose synthesized as described previously (25) was purchased from Funakoshi. The resultant product consisted of UDP-rhamnose, UMP, UDP, and the condensation product of UDPdiphenylphosphate with UMP. The purity of UDP-rhamnose was determined by liquid chromatography-mass spectrometry (MS) as described previously (11).
Coexpression Analyses-Coexpression analyses were carried out using a Coexpression Gene Search algorithm on the RIKEN PRIMe web site. The Coexpression Gene Search program is a web-based application designed to identify correlated genes from gene expression data produced using Affymetrix Gene-Chip technology by the AtGenExpress consortium (RIKEN Plant Science Center and the Max-Planck Institute for Molecular Plant Physiology) deposited in TAIR. Data for version 1 (all data sets, version 1, tissue and development, stress treatments, and hormone treatments) were used as a matrix for the analyses. Correlation coefficients have already been released in ATTED-II (23). To minimize the effects of experimental artifacts, data were renormalized, and Pearson's correlation coefficient between genes was weighted in ATTED-II. The Pajek program was used for output.
General Molecular Procedures-The molecular procedures used were as described previously (26) unless otherwise specified. A phylogenetic tree was generated from a multiple alignment by the CLUSTALW program at the DNA Data Bank of Japan web site using the neighbor joining method of the TREE-VIEW program (27). Reverse transcription (RT)-PCR was performed as described previously (28) with primers ugt89c1-1f and ugt89c1-1r for ugt89c1-1, ugt89c1-2f and ugt89c1-2r for ugt89c1-2, and TUBf and TUBr for tubulin (GenBank TM Accession number AK117431) (supplemental Table 1).
Plant Transformation-A 3-kb genomic fragment covering the 1,544 bp of the promoter region, the entire UGT89C1 coding region, and 187 bp of the 3Ј non-coding region was amplified by PCR with primers UGT89C1-GWf and UGT89C1-GWr and used as a genomic clone for the complementation test. The Gateway system was used for construction of the binary vector for Arabidopsis transformation. The PCR product was cloned into the pENTR TM /D-TOPO vector (Invitrogen Japan) as an entry vector and sequenced to confirm the absence of PCR errors. pGWB1 was used as a destination vector, and the LR reaction for the binary vector, pKYS320, was catalyzed by the Gateway LR clonase enzyme mix (Invitrogen). pKYS320 was transformed into Agrobacterium tumefaciens GV3101(pMP90) by the freeze and thaw method (29), and Arabidopsis plants were transformed by the floral dip method (30). Transgenic T2 plants were selected on 1/2 Murashige-Skoog medium containing 25 mg liter Ϫ1 hygromycin B and 50 mg liter Ϫ1 carbenicillin disodium salt. mRNA accumulation in T2 plants was checked by RT-PCR as described above.
Expression and Purification of Recombinant UGT89C1 Protein-A full-length cDNA clone of UGT89C1 (pda08132) was obtained from the RIKEN BioResource Center Arabidopsis full-length cDNA collection (31,32). The full-length UGT89C1 was amplified by PCR using the primers UGT89C1-BamHI and UGT89C1-PstI (supplemental Table 1), and the amplified fragment was cloned into pCR-Blunt II-TOPO vector (Invitrogen) to confirm the nucleotide sequence and into pET-41b(ϩ) vector (Novagen, San Diego, CA). Escherichia coli strain BL21star(DE3) was used as a host for expression. Transformed cells were cultivated at 37°C until A 600 reached 0.5. After the addition of isopropyl-␤-D-thiogalactopyranoside to a final concentration of 1 mM, cells were cultured at 20°C for 4 h. The cells were collected, and the protein was purified as a glutathione S-transferase (GST) fusion according to the manufacturer's instructions.
Enzyme Assays-The standard enzyme assay reaction mixture (final volume, 100 l) consisted of 50 mM potassium phosphate, pH 7.0, 150 M flavonoid substrate, and 500 M UDPsugar. The mixture was preincubated at 30°C for 2 min, and the reaction was started by the addition of enzyme. Reactions were stopped after 0, 4, 8, or 12 min of incubation at 30°C by the addition of 100 l of ice-cold 1.0% (v/v) trifluoroacetic acid, and the supernatant was recovered by centrifugation at 12,000 ϫ g for 3 min. Flavonoids in the resultant solution were analyzed as described below.
Flavonoid Analysis by UPLC TM /PDA/ESI-Q-TOF/MS-Frozen Arabidopsis leaves were homogenized in extraction solvent (methanol: H 2 O ϭ 4:1) with 5 l of solvent/mg of fresh weight in a mixer mill (MM300; Retsch GmbH & Co. KG) for 3 min at 30 Hz. After centrifugation at 12,000 ϫ g for 10 min, cell debris was discarded, and supernatants were recentrifuged. The resultant supernatants were immediately analyzed with a Waters Acquity UPLC system (Waters Corp.) fitted with a Q-TOF Premier mass spectrometer (Micromass MS Technol-ogies). A 2-l sample was applied to an ACQUITY UPLC BEH C 18 column (⌽2.1 ϫ 100 mm, 1.7 m, Waters) at a flow rate of 0.5 ml/min with linear gradients of solvent A (0.1% formic acid in H 2 O) and solvent B (0.1% formic acid in methanol) set according to the following profile: 0 min, 95% solvent A ϩ5% solvent B; 9 min, 60% solvent A ϩ 40% solvent B; 11 min, 100% solvent B; 13 min, 95% solvent A ϩ5% solvent B. The column temperature was 35°C. Photodiode array (PDA) was used for detection of UV-visible absorption in the range of 210 -500 nm. Electrospray ionization (ESI) with positive mode was used. The TOF mass analyzer was used for detection of flavonoid glycosides [MϩH] ϩ and fragment ion peak in a positive ion mode scanning with the following setting; desolvation temperature was 450°C at a nitrogen gas flow rate of 600 liters/h, capillary spray 3.2 kV, source temperature 150°C, and cone voltage 35 V.
Identification of the peaks in the plant extracts was based on comparisons of retention times, UV-visible absorption spectra, and mass fragmentation patterns by tandem MS analysis of the flavonoid standards. Other flavonoids with no standard compounds were annotated by comparison with the reported data in the UV-visible absorption spectra, elution time, m/z values, and MS 2 fragmentation patterns.
Quantitative Real-time PCR-RNA Extraction and cDNA synthesis were performed as described previously (28). The developmental stage of each organ used for analysis was as follows: leaves, 18 days after sowing; stems, 25 days after sowing; roots, 8 days after sowing; siliques, 11 days after flowering.
Accumulation levels of the UGT89C1 transcripts were analyzed by a real-time PCR method, with an ABI PRISM 7500 real-time PCR system (Applied Biosystems) monitoring the amplification with the SYBR-Green I dye (Applied Biosys- tems). The primers, UGT89C1-RTf, UGT89C1-RTr, TUB-RTf, and TUB-RTr, (supplemental Table 1) were designed using Primer Express software (Applied Biosystems) and checked for specific product formation by a dissociation program. In each case, plasmid DNA containing the corresponding UGT89C1 or tubulin was used as a template to generate a calibration curve.

RESULTS
Coexpression Analysis to Predict the Genes Encoding Flavonoid-related Glycosyltransferases-To narrow the number of potential UGT candidates involved in flavonoid modification, we used coexpression analyses within the ATTED-II data base in a coexpression gene search program on the RIKEN PRIMe web site. First, we verified the reliability of the data base by searching correlation coefficients among the known genes encoding flavonoid biosynthetic enzymes. The linkages between genes that had a higher correlation coefficient (rϾ0.6) in all GeneChip data are presented in Fig.  1. Genes involved in flavonoid biosynthesis were divided into two groups by this coexpression analysis. The first group includes chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), flavonoid 3Ј-hydroxylase (F3ЈH), and flavonol synthase (FLS), and the second group contains dihydroflavonol reductase (DFR), anthocyanidin synthase (ANS), and GST(TT19). The members of each resultant group are consistent with their classification as "early" (CHS, CHI, F3H, and FLS) or "late" genes (DFR and ANS) of anthocyanin biosynthesis (33,34). In addition, this result suggested that F3ЈH is an early gene and GST is a late gene from the view point of coexpression patterns. Coexpression analysis also indicated that two MYB transcriptional regulators, MYB12 for flavonol (35) and PAP1 for anthocyanin (36), form coexpression networks with structural genes for biosyntheses of flavonol and anthocyanin, respectively. PAP1 showed no correlation with DFR, ANS, or GST by these data sets. However, in the stress treatment sub-data set, PAP1 correlated with DFR (r ϭ 0.636), ANS (r ϭ 0.587), and GST (r ϭ 0.660). MYB12 also correlates with CHS (r ϭ 0.677) and F3ЈH (r ϭ 0.706) by the sub-data set from the tissue and development matrix. These results suggest that correlation analysis can be useful for finding functional clues about uncharacterized genes in the flavonoid biosynthesis pathway.
To identify novel UGT genes likely to be involved in the flavonoid biosynthetic pathway, we analyzed the correlation networks of 107 UGT genes (37) using the known structural and regulatory genes (i.e. CHS, CHI, F3H, F3ЈH, FLS, DFR, ANS, GST, PAP1, and MYB12) as "bait" (Fig. 1). Significantly positive correlations were observed between the bait genes involved in flavonoid biosynthesis and five UGT genes: At4g14090 (UGT75C1, anthocyanin 5GlcT) (12), At5g17050 (UGT78D2, flavonoid 3GlcT) (12), At5g54060 (UGT79B1), At4g15480 (UGT84A1) and At1g06000 (UGT89C1). At3g21560 (UGT84A2) was correlated with At5g17050 (UGT78D2). Although flavonoid 3GlcT (Fd3GlcT) catalyzes 3-O-glucosylation of both flavonols and anthocyanidins, it fell into the early genes cluster, implying that the early genes are involved both in biosynthesis of flavonol and in the early steps of anthocyanin production. Anthocyanin 5GlcT fell within the late genes cluster. These results suggest that three UGTs (UGT84A1, UGT84A2, and UGT89C1) are involved in flavonol and/or anthocyanidin glycosylation and that UGT79B1 may be involved only in anthocyanin glycosylation. UGT89C1 was strongly correlated with the flavonol biosynthetic genes CHS, CHI, F3H, and FLS and Fd3GlcT. We thus gave priority to the characterization of UGT89C1, among the four unknown UGTs forming coexpression networks in the flavonoid pathway.
Flavonols, but not anthocyanins, that are glycosylated with glucose or rhamnose at their C-7 position have been identified in Arabidopsis (12,15), implying the presence of 7GlcT and 7RhaT, and the gene encoding flavonol 7GlcT has already been identified (11). It is likely, then, that UGT89C1 encodes a novel flavonol 7RhaT.
Analysis of T-DNA Mutants of UGT89C1-To confirm the physiological function of UGT89C1 in Arabidopsis, two independent T-DNA insertion lines, SALK_068559 and SALK_071113, were designated as ugt89c1-1 and ugt89c1-2, respectively. The T-DNA of both mutants was inserted in the exon of UGT89C1 in both mutants but between positions ϩ1034 to ϩ1042 and ϩ78 to ϩ141 base pairs, respectively, from the putative transcription start site proposed from the full-length cDNA sequence (GenBank accession number AY093133) (Fig. 3A). No transcripts of UGT89C1 were detected by RT-PCR in homozygotes of either line (Fig. 3B), and there were no obvious phenotypic abnormalities in the mutant plants (data not shown).
To demonstrate that the metabolic types deficient in 7-Orhamnosylated flavonol are correctly ascribed to the UGT89C1 mutation, both ugt89c1-1 and ugt89c1-2 plants were transformed with a genomic clone of UGT89C1. Eight independent transgenic lines that accumulate UGT89C1 transcripts had the same flavonoid profile as wild-type plants (Fig. 4B). All of these in vivo data indicate that UGT89C1 encodes UDP-rhamnose: flavonol 7-O-rhamnosyltransferase.
In Vitro Characterization of Recombinant UGT89C1-Recombinant UGT89C1 protein was expressed in E. coli as a GST fusion and purified. The GST-UGT89C1 fusion catalyzed the conversion of kaempferol 3-O-glucoside to a single product, kaempferol 3-O-glucoside-7-O-rhamnoside (Fig. 5), as confirmed by UPLC retention time and MS 2 ionization when compared with the standard compound. GST alone as a negative control catalyzed no conversion to the 7-O-rhamnoside. Thus, UGT89C1 can be defined as a flavonol 7RhaT.

Expression of UGT89C1 and Flavonol Accumulation in
Arabidopsis-The accumulation of UGT89C1 transcripts in plant organs was measured by real-time PCR (Fig. 6A). Transcripts of UGT89C1 were detected in all tissues tested but were particularly abundant in floral buds and at very low levels in roots and siliques. This accumulation pattern was consistent with the distribution of UGT89C1-MPSS signatures in the Arabidopsis MPSS data base, in which UGT89C1 expression was strong in floral buds but weak in leaves, roots, and siliques.
The accumulation levels of three kaempferol glycosides (f1-f3) and three quercetin glycosides (f5, f6, and f8) in leaves, floral buds, and flowers were determined by UPLC/ PDA/ESI-Q-TOF/MS. Concentrations of these six flavonols in buds and flowers ranged from ϳ10to 70-fold higher than in leaves (Fig.  6B). These organ-specific concentrations are highly correlated with the pattern of UGT89C1 transcript accumulation as the final metabolic product would be expected to accumulate in a subsequent developmental stage. The kaempferol glycosides were the predominant flavonols in leaves, making up 97% of total flavonoids. However, 25% of total flavonoids in floral buds and flowers were quercetin glycosides.

Evaluation of Coexpression Analysis for Functional Genomics and Other Candidate Genes Involved in Flavonoid Biosynthesis-Flavonol
7RhaT contributes significantly to the flavonoid composition of Arabidopsis. Of the eight known flavonol glycosides in Arabidopsis, seven are flavonol 7-O-rhamnosides (13). In the present study, we could efficiently target flavonoid-related UGTs among 107 candidates by coexpression analyses and were able to identify a flavonol 7RhaT gene by a combination of reverse genetics and the enzymatics of recombinant proteins.
Results of our coexpression analyses are in remarkable agreement with the sum of what is known about flavonoid biosynthesis, such as classification of genes as either early or late. The results also confirmed what is known about tissuespecific timing of expression of the flavonoid biosynthetic genes. UGT89C1 was also significantly correlated with the putative nucleoside diphosphate-rhamnose synthase gene, At1g78570 (r ϭ 0.641, all data sets; 0.765, tissue and development; 0.763, stress treatments). Besides UGT89C1, At1g78570 also correlated with another UGT, At1g30530 (flavonoid 3RhaT, r ϭ 0.654 from the stress matrix). These data suggest that At1g78570 may function as a UDP-rhamnose synthase, and the expression of UGT89C1 is well coordinated with the biosynthetic genes of UDP-rhamnose and the flavonols. There was no correlation between flavonol 7GlcT and the fla- vonoid biosynthetic genes among the known flavonoid UGTs. Presumably this is due to the unique distribution of flavonol 7GlcT transcripts that are accumulated in leaves and flowers but not in roots, siliques, or stems, despite the constitutive expression of other flavonoid biosynthetic genes (11). This observation is also supported by "Digital in situ" in The Arabidopsis Gene Expression Data base (38), which indicates that flavonol 7GlcT expression is completely different from other flavonoid biosynthetic genes, even within the same organ (supplemental Fig. 2). The relationship among biosynthetic genes and transcription factors is also not so obvious with all data sets of the transcriptome. MYB12 and PAP1 link to flavonoid biosynthetic genes, but the regulators TTG1, GL3, and EGL3 do not. The lack of linkage by TTG1, GL3, and EGL3 may be due to their unique transcript distribution for root hair formation and mucilage production except for flavonoid biosynthesis and/or their lower expression levels when compared with structural genes and MYB12/PAP1. Correlation coefficients in a limited category or under highly specified conditions may be a much more effective index than any of the existing data sets for finding a particular gene, or set of genes, but it is difficult to find a gene that is expressed only under very limited conditions using coexpression analysis of many data sets because the limited number of correlations reduces its correlation coefficient to close to the background noise level. Metabolite target analysis and microarray analysis using a single cell from each of the various organs, or preferably, specific tissues within an organ, will be required to eliminate "signal noise" from genes that overlap the pathway of interest. However, identification of UGT89C1 is a clear example of a proof-of-concept for filling in the blanks in a metabolic scheme.
The Complete Flavonol Glycosylation Pathway in Arabidopsis-Integration of UGT89C1 substrate preference with previous reports allows us to draw the complete flavonol glycosylation pathway in Arabidopsis (Fig. 7). The substrate preference of UGT89C1 suggests that flavonol 7-O-rhamnosylation occurs after 3-O-glycosylation. The exact pathway to kaempferol 3-O-   Flavonol profiles in the roots of wild-type plants are more complicated than in leaves. In ugt89c1 mutants, unknown flavonol derivative peaks (U1, U3, and U4) were detected only in leaves. These results may be due to a different distribution pattern of enzymes in the two organs that can modify flavonoids. For example, flavonol 7-O-rhamnosides were missing, and unknown flavonol derivatives (but not flavonol 3-O-glycosides) were accumulated in leaves of ugt89C1 mutants, suggesting that flavonol 3-O-glycosides are unstable and/or rapidly modified further in planta.
Structural Consideration of UGTs in Terms of Substrate Specificity-The functional identification of UGT89C1 as a flavonol 7RhaT allowed us to compare the sequence of flavonoid 3GlcT, flavonol 3RhaT, flavonol 7GlcT, and flavonol 7RhaT from A. thaliana (At3GlcT, At3RhaT, At7GlcT, and At7RhaT). The three-dimensional structure of grape flavonoid 3GlcT was very recently determined and suggests the presence of several key residues that interact with UDP-sugar and the flavonoid backbone (43). The amino acid residues Gln-375, Asp-374, and Thr-141 are proposed to interact with hydroxyl groups at the C-2 and C-3, C-3 and C-4, and C-6 positions of the glucose moiety of UDP-glucose, respectively. The residue that corresponds to Gln-375 of the grape 3GlcT is present in At3GlcT and At7GlcT, but the position is Asn in At3RhaT and His in At7RhaT (supplemental Fig. 3). Asp-374 in the grape 3GlcT is conserved in all Arabidopsis flavonoid UGTs, although the configuration of hydroxyl groups at the C-3 and C-4 positions of UDP-sugars are different. Thr-141 varies in Arabidopsis UGTs (Thr in At3GlcT, Ala in At3RhaT, Gly in At7GlcT, and Pro in At7RhaT). It was reported that the His residue corresponding to Gln-375 in grape 3GlcT is conserved among galactosyltransferases and may be involved in recognition of a hydroxyl group configuration at the C-4 position (44). Although the configuration at C-4 is the same in galactose and rhamnose, the His residue is found in At7RhaT, but not in At3RhaT. Thus, the residues involved in sugar donor specificity cannot be ascribed to a single amino acid residue. Evolution of UGTs-The phylogenetic tree implies that flavonoid UGTs diverged from an ancestral gene into isogenes specific for the position of sugar attachment (i.e. divergence into 3-O-glycosyltransferase, 5-O-glycosyltransferase, 7-O-glycosyltransferase, etc.) and later gained the ability to use various UDPsugars, such as UDP-glucose, UDP-rhamnose, and UDP-galactose. Furthermore, at least in Arabidopsis, UGT enzymes involved in C-3 glycosylation acquired the ability to use UDP-sugars after plant species divergence. However, C-7 glycosyltransferases apparently diverged before speciation. The sequence identity between At3GlcT and At3RhaT (72%) is much higher than between At7GlcT and At7RhaT (30%). It was reported that plant UGTs are divided into two major groups, those containing the conserved intron A and those that primarily contain no introns (45). The former group consists of UGT71-UGT73, UGT79, and UGT88 -92, and the latter consists of UGT74-UGT78 and UGT82-UGT87. This divergence suggests that flavonoid 3-O-glycosyltransferase (UGT78) and 7-O-glycosyltransferase (UGT73 and UGT89) evolved independently and/or at different rates.
Coexpression analysis indicated that UGT89C1 and UGT78D2 belong to the early biosynthetic genes and UGT75C1 belongs to the late biosynthetic genes, although UGT78D2 has the ability to glucosylate the C-3 position of both flavonols and anthocyanidin. This trait may be due to the predominant and earlier production of flavonols under normal growth conditions or may be reflective of the evolutionary past of UGT. Chalcones and flavanones were first synthesized over 500 million years ago; flavonols followed later, and finally, anthocyanins appeared about 120 million years ago (46,47). This time line indicates that early genes probably appeared first, and late genes emerged afterward. UGT78D2 could have evolved contemporaneously with flavonol and could have later adapted to glycosylate anthocyanidins. In general, the classification of UGTs, at least in terms of substrates, may be reflected in the time point of secondary metabolite appearance.
Improved analytical instruments and techniques will lead to the discovery of other secondary compounds and a subsequent unraveling of a number of structural and biosynthetic mysteries in the near future. Integration of metabolomics, including target analysis, and transcriptomics may reveal complete metabolic pathways, including intermediates, and may also suggest the evolutionary history, and in turn, the regulation system for each identified gene, protein, or metabolic pathway.