Metabolomics, Pathway Regulation, and Pathway Discovery*

Metabolomics is a data-based research strategy, the aims of which are to identify biomarker pictures of metabolic systems and metabolic perturbations and to formulate hypotheses to be tested. It involves the assay by mass spectrometry or NMR of many metabolites present in the biological system investigated. In this minireview, we outline studies in which metabolomics led to useful biomarkers of metabolic processes. We also illustrate how the discovery potential of metabolomics is enhanced by associating it with stable isotopic techniques.

Metabolomics is a data-based research strategy, the aims of which are to identify biomarker pictures of metabolic systems and metabolic perturbations and to formulate hypotheses to be tested. It involves the assay by mass spectrometry or NMR of many metabolites present in the biological system investigated. In this minireview, we outline studies in which metabolomics led to useful biomarkers of metabolic processes. We also illustrate how the discovery potential of metabolomics is enhanced by associating it with stable isotopic techniques.
Metabolomics developed in the early 2000s as part of the omics movement, which transformed the research strategy of many biomedical disciplines. Hypothesis-based research remained the Golden Rule of biological investigation (Fig.  1A). However, in many cases, the formulation of hypotheses had become difficult or impossible for lack of sufficient information on the increasingly complex problems investigated. The data-based research strategy of metabolomics would yield biomarker pictures of the metabolome under basal and modulated conditions (Fig. 1B). Analysis of the biomarker pictures would, in favorable cases, allow the formulation of hypotheses and their subsequent testing. Hypothesis-based and data-based approaches are complementary and are best carried out iteratively to advance knowledge (1). Metabolomics applied to a given problem can sometimes yield unexpected findings that were not related to the original problem investigated (Fig. 1B). This can lead to new avenues of research.
Initial metabolomic studies were pioneered by Nicholson and co-workers (2)(3)(4)(5), Fiehn and co-workers (6 -8), and others (9, 10). Nicholson coined the word "metabonomics" to mean "the dynamic metabolic response of living systems to biological stimuli," in contrast to "metabolomics," viewed to mean "the analytical description of complex biological samples." As acknowledged by Nicholson, the two terms are used interchangeably. In the United States, the development of metabolomics was spurred by the 2003 Metabolomic Roadmap Initiative of the National Institutes of Health, the goal of which was to remove roadblocks to discovery in biomedical research.
Metabolomics evolved in different directions. In its initial and most commonly used format, metabolomics involves the non-targeted measurement, mostly by mass spectrometry and/or NMR, of the concentrations of all assayable metabolites present in a biological sample. Estimates of the number of metabolites in mammalian and plant cells range from 3000 to 8000 (Human Metabolome Database) (12). Of these, only a small fraction is currently assayable in most cases. A number of metabolites present in mammalian and plant cells are of exogenous origin: compounds derived from the intestinal microbiome, drugs, pesticides, and environmental pollutants. A second format targets classes or sets of metabolites, the concentrations of which are likely to provide useful information on the process investigated. A third format associates metabolomics (non-targeted or targeted) with stable isotope technologies to yield concentrations and labeling patterns of metabolites.
Metabolomic investigations use sophisticated analytical techniques to identify and evaluate the concentrations of large numbers of known and unknown metabolites. The identification of known metabolites is often difficult because pure standards of many compounds are not available. Also, the cost of setting up large collections of standards is prohibitive. The Metabolomic Standard Initiative of the National Institutes of Health and other groups are working on setting up collections of standards (and mixtures of standards) available to investigators (13)(14)(15). The problem is alleviated by the publication and commercialization of electronic libraries of metabolomic analyses by GC-MS (16) or LC-MS. These libraries list retention times and characteristic ions of hundreds of compounds. Each library is specific to one type of chromatographic column and, for GC-MS libraries, one type of derivatization and ionization (usually methoximation ϩ trimethylsilylation and electron ionization). Many detected compounds are not identified and are listed as "mass spectral tags" or "events" with their retention times and ion spectra.
Some of the reported concentrations are absolute, i.e. expressed in chemical units. This is achieved when stable isotope-labeled internal standards or unlabeled non-biological standards of analog compounds are available. Such standards allow the use of multiple-point calibration curves. This is doable for only a small number of metabolites. In many cases, the reported concentrations are relative to one or one of a few reference standards added to the sample. The reference standards are either unnatural compounds or heavy mass isotopomers of natural compounds, e.g. [ 13 C 6 ]glucose or 3-hydroxy[ 2 H 6 ]butyrate. The linearity of the (signal of analyte)/ (signal of reference standard) ratios cannot always be assessed, especially (i) when standards of analytes are not available and (ii) when unidentified compounds are monitored. One option proposed for metabolomic studies in microorganisms is to use, as a mixture of labeled internal standards, an extract of Saccharomyces cerevisiae grown on fully 13 C-labeled substrates: (17). Then, the mass isotopomer distribution of the labeled internal standards does not * This work was supported, in whole or in part, by National Institutes of Health Roadmap Grant R33DK070291 and Grant R01ES013925. This is the second article in the Thematic Minireview Series on Computational Systems Biology. This minireview will be reprinted in the 2011 Minireview Compendium, which will be available in January, 2012. 1 To whom correspondence should be addressed. E-mail: hxb8@case.edu.
overlap with the mass isotopomer distribution of the corresponding naturally labeled analytes with at least three carbons.
The absolute or relative concentrations of known and unknown metabolites are analyzed by statistical methods (principal component analysis, partial least squares, etc.). This allows sets of samples to be differentiated. The data of statistical analyses are presented as graphs and heat maps (18). The statistical analysis of metabolomic data is beyond the scope of this minireview.

Classical Metabolomics
As a tool to generate a hypothesis to be tested, metabolomics is not a quick route to discovery because it imposes a sometimes long and arduous first phase in an investigation. This explains why the vast majority of metabolomic studies published to date (namely Ͼ4000 papers) are limited to the first phase, i.e. biomarker discovery. The interpretation of biomarker profiles is often very difficult, especially when many concentrations vary between groups, such as diabetic versus control (19). In such cases, the formulation of hypotheses to explain the variations in metabolite profiles is difficult and frequently impossible. The above statements are not meant to deny the value of biomarker profiling in biological, medical, and pharmacological investigations, as illustrated by the following examples, but rather highlight a challenge for the field going forward.
Sreekumar et al. (18) recently conducted an extensive metabolomic study of prostate cancer. They reported that the content of sarcosine (N-methylglycine) in prostate biopsies or in urine allows the differentiation of benign prostate conditions, noninvasive carcinoma, and invasive carcinoma (see Fig.  3A of Ref. 18). Also, the sarcosine contents of invasive cancer cell lines were higher compared with benign prostate epithelial cells. The authors concluded that "components of the sarcosine pathway may have potential as biomarkers of prostate cancer progression and serve as new avenues for therapeutic intervention." If this finding is confirmed, testing for sarcosine in prostate biopsies and urine could prevent the unnecessary and debilitating treatment of many patients with noninvasive prostate cancer. This is a major public health problem because many men age 70 and above have noninvasive prostate carcinoma.
In rats injected with a large dose of acetaminophen, the formation of N-acetyl-p-benzoquinone imine leads to depletion of liver glutathione. Soga et al. (20) reported that the decrease in the plasma and liver concentrations of glutathione is mirrored by increases in the concentration of ophthalmate, a glutathione analog (glutamate/2-aminobutyrate/glycine). Because glutathione and ophthalmate are synthesized by the same enzymes (Fig.  2), the authors proposed the following sequence of events: oxidative stress 3 depletion of glutathione 3 derepression of ␥-glutamylcysteine synthetase 3 depletion of cysteine 3 activation of ophthalmate synthesis. They also hypothesized that ophthalmate "may be a new biomarker for oxidative stress." This is a promising avenue of research.
In a study on the regulation of gluconeogenesis and the citric acid cycle in perfused rat livers, Yang et al. (21) found that two inhibitors of gluconeogenesis form adducts with keto acids, which are intermediates of gluconeogenesis. Aminooxyacetate, which inhibits the aminotransferases involved in gluconeogenesis from lactate (22), forms adducts with pyruvate, ␣-ketoglutarate, and oxaloacetate (Fig. 3A). The formation of the pyruvate adduct (carboxymethoxylamine) had been hypothesized in 1972 but not demonstrated (23). Mercaptopicolinate, a strong inhibitor of P-enolpyruvate carboxykinase (24), forms an adduct with pyruvate (mercaptopicolinyl pyruvate hemithioketal) (Fig. 3B). The rapid formation of these adducts was demonstrated by in vitro experiments in which mixed solutions of keto acid and inhibitor were infused, just after mixing, in the source of a mass spectrometer (21). These  adducts may exert metabolic effects unrelated to their effect on gluconeogenesis.

Isotopomer Analysis Adds Value to Metabolomics
The steady-state concentration of a metabolite can result from many combinations of its rate(s) of synthesis and its rate(s) of disposal. Increases or decreases in the concentrations of metabolites are frequently ascribed to increases or decreases in the fluxes through the pathways these metabolites are part of. This is seldom justified in the absence of flux measurements. For example, when an anaplerotic substrate passes through some of the reactions of the citric acid cycle, the concentrations of only some of the cycle intermediates increase. Also, in rat hearts perfused with increasing concentrations of propionate, only the malate concentration increases (25). This finding does not allow any conclusion to be made on the modulation by propionate of the flux of acetyl oxidation in the citric acid cycle. Therefore, in the absence of isotopic tracers, it is seldom possible to infer variations in flux rates from variations in metabolite concentrations. Because mass spectrometry and NMR allow the calculation of isotopic enrichments, the measurement of metabolic fluxes with substrates enriched with stable isotopes does not require the acquisition of additional equipment.
Another advantage of using 13 C-labeled substrates in metabolomic experiments is the possibility of discovering new metabolites, reactions, and pathways from the pattern of label distribution in the metabolome (Fig. 1B). A typical strategy involves comparing electron ionization GC-MS traces, in full ion scan mode, from experiments conducted with unlabeled substrate, fully 13 C-labeled substrate, and a 50:50 mixture of unlabeled and fully 13 C-labeled substrate. Using a GC-pyrolysis-isotope ratio mass spectrometer also equipped with a quadrupole detector allows the identification of peaks labeled from the original substrate. This is achieved by superposing the qua-drupole and the isotope ratio traces. Also, non-targeted fate detection (26) enables the quantitative detection of all measurable metabolites derived from a specific labeled compound. "Without a priori knowledge of a reaction network or compound library, the mass isotopomer distribution of labeled compounds provides information about relative fluxes into each metabolite pool (even when the labeled metabolite is not identified)." The technique is applicable to substrates labeled with 13 C, 15  Chen et al. (27) searched for new metabolites of acetaminophen by injecting mice with 2,3,5,6-[ 2 H 4 ]acetaminophen, [ 2 H 3 ]acetyl acetaminophen, or the unlabeled compounds. Using a combination of mass isotopomer analysis, 2 accurate mass measurement, and tandem mass spectrometry fragmentation, they identified four new metabolites associated with acetaminophen toxicity. The distribution of these metabolites was different in Cyp2e1-null mice compared with wild-type mice. The data suggest a role of Cyp2e1 in the oxidative stress induced by high doses of acetaminophen.
Boren et al. (28) investigated the mechanism by which butyrate, generated by the intestinal microbiome, induces cell differentiation in HT29 human colon adenocarcinoma cells. They incubated HT29 cells with [1,2-13 C 2 ]glucose or [1,2-13 C 2 ]butyrate in the presence of the other unlabeled substrate.
[1,2-13 C 2 ]Glucose is an interesting tracer because the derived ribose units incorporated into nucleic acids are singly or doubly labeled when ribose is formed by the oxidative or non-oxidative branch of the pentose phosphate pathway, respectively (29). The mass isotopomer distribution of long-chain fatty acids allowed the calculation of the contributions of glucose and butyrate to the acetyl-CoA that is used for fatty acid synthesis and for oxidation in the citric acid cycle. In HT29 cells, increasing concentrations of butyrate inhibited glucose uptake, glucose oxidation, and nucleic acid ribose synthesis. The butyrate carbon replaced the glucose carbon for de novo fatty acid synthesis and citric acid cycle flux. These effects were not observed in MIA pancreatic adenocarcinoma cells, which are butyrateinsensitive. The data suggest, "The mechanism by which colon carcinoma cells acquire a differentiated phenotype is through a replacement of glucose for butyrate as the main carbon source for macromolecule biosynthesis and energy production." A series of isotopic studies have demonstrated the reversibility of the citric acid cycle reactions between citrate and ␣-ketoglutarate (Fig. 4A), despite the fact that the equilibrium of isoci-trate dehydrogenase is far toward ␣-ketoglutarate. In rat livers perfused with 1 mM [ 13 C 5 ]glutamine or [ 13 C 5 ]glutamate, a substantial fraction of citrate was M5, i.e. labeled on five carbons (30). Such M5 labeling of citrate could not occur via the forward reactions of the citric acid cycle. One could argue that the increase in ␣-ketoglutarate concentration induced by glutamine or glutamate displaced the unfavorable equilibrium of isocitrate dehydrogenase. However, in perfusions with 10% 13 C-enriched bicarbonate (and no glutamine or glutamate), the degree of labeling of C-6 of citrate could only be explained by the reversal of isocitrate dehydrogenase ϩ aconitase (31). The reversibility of the reactions between citrate and ␣-ketoglutarate allows some cells to synthesize long-chain fatty acid from glutamine via citrate and ATP-citrate lyase. For example, in brown adipocytes incubated with [ 13 C 5 ]glutamine or [1-13 C]glutamine, 90% of the flux of glutamine to lipids occurs via the following sequence: glutamine 3 glutamate 3 ␣-ketoglutarate 3 isocitrate 3 citrate 3 acetyl-CoA 3 fatty acids ( Fig. 4A) (32).

Conclusions
During the pre-genomics era, metabolic research has progressively evolved from (i) the identification of fairly small number of metabolites, to (ii) investigations of factors that result in variations of the concentrations of these metabolites, FIGURE 4. A, scheme of the citric acid cycle and lipogenesis via ATP-citrate lyase. The scheme emphasizes (i) the reversibility of the conversion of citrate to isocitrate (ICIT) and ␣-ketoglutarate via aconitase and isocitrate dehydrogenase (ICDH) and (ii) the set of reactions by which carbons 4 and 5 of glutamine are used for fatty acid (FA) and sterol synthesis in some mammalian cells (red arrows). B, scheme of the catabolism of 4-hydroxy acids with at least five carbons using 4-hydroxy[3,4-13 C 2 ]nonanoate as an example (modified from Ref. 33). Carbons 3 and 4 of the substrate are colored red and green, respectively, to facilitate the tracing of their fates through pathways A and B. Pathway A includes 4-phosphononanoyl-CoA. Note that the doubly labeled substrate forms acetyl-CoA, part of which is doubly labeled (M2) via pathway A and singly labeled (M1) via pathway B. Formate, derived from carbon 3 of the substrate, is formed via pathway B.
to (iii) investigations with radioactive and stable isotopes to calculate metabolic fluxes and to identify pathway steps and regulatory mechanisms. In reflecting on the above evolution, one concludes that metabolic research has come full circle over the last century: from the study of the variations of the concentrations of fairly small numbers of metabolites to the study of the variations of the concentrations of very large number of metabolites (metabolomics). On the basis of the historical evolution of research strategies, one can predict that present-day metabolomics is just a step in the logical progression to the integration between metabolomics and the most advanced isotopic techniques, i.e. isotopomer analysis. The direct and rapid move to this next level of metabolomics has enormous potential for increasing biomedical knowledge and improving public health.