1H NMR Metabolomics Analysis of Glioblastoma Subtypes

Background: Unpredictable clinical behavior of glioblastoma multiforme suggests distinct molecular subtypes. Results: Metabolic profiles of different glioblastoma lines indicate distinct subtypes correlated with gene expression differences. Conclusion: A subset of metabolites can be used to distinguish between four subtypes of glioblastomas. Significance: Metabolic profiling of cancers provides a way for subtype determination with possible diagnostic and prognostic applications. Glioblastoma multiforme (GBM) is the most common form of malignant glioma, characterized by unpredictable clinical behaviors that suggest distinct molecular subtypes. With the tumor metabolic phenotype being one of the hallmarks of cancer, we have set upon to investigate whether GBMs show differences in their metabolic profiles. 1H NMR analysis was performed on metabolite extracts from a selection of nine glioblastoma cell lines. Analysis was performed directly on spectral data and on relative concentrations of metabolites obtained from spectra using a multivariate regression method developed in this work. Both qualitative and quantitative sample clustering have shown that cell lines can be divided into four groups for which the most significantly different metabolites have been determined. Analysis shows that some of the major cancer metabolic markers (such as choline, lactate, and glutamine) have significantly dissimilar concentrations in different GBM groups. The obtained lists of metabolic markers for subgroups were correlated with gene expression data for the same cell lines. Metabolic analysis generally agrees with gene expression measurements, and in several cases, we have shown in detail how the metabolic results can be correlated with the analysis of gene expression. Combined gene expression and metabolomics analysis have shown differential expression of transporters of metabolic markers in these cells as well as some of the major metabolic pathways leading to accumulation of metabolites. Obtained lists of marker metabolites can be leveraged for subtype determination in glioblastomas.

Glioblastoma multiforme (GBM) is the most common form of malignant glioma, characterized by unpredictable clinical behaviors that suggest distinct molecular subtypes. With the tumor metabolic phenotype being one of the hallmarks of cancer, we have set upon to investigate whether GBMs show differences in their metabolic profiles. 1 H NMR analysis was performed on metabolite extracts from a selection of nine glioblastoma cell lines. Analysis was performed directly on spectral data and on relative concentrations of metabolites obtained from spectra using a multivariate regression method developed in this work. Both qualitative and quantitative sample clustering have shown that cell lines can be divided into four groups for which the most significantly different metabolites have been determined. Analysis shows that some of the major cancer metabolic markers (such as choline, lactate, and glutamine) have significantly dissimilar concentrations in different GBM groups. The obtained lists of metabolic markers for subgroups were correlated with gene expression data for the same cell lines. Metabolic analysis generally agrees with gene expression measurements, and in several cases, we have shown in detail how the metabolic results can be correlated with the analysis of gene expression. Combined gene expression and metabolomics analysis have shown differential expression of transporters of metabolic markers in these cells as well as some of the major metabolic pathways leading to accumulation of metabolites.

Obtained lists of marker metabolites can be leveraged for subtype determination in glioblastomas.
Tumor cells have a remarkably different metabolism than the tissues they derive from. Many key oncogenic signaling pathways converge to create this change to support growth and survival of cancer cells (1). The unique metabolic phenotype associated with cancer is in general characterized by (a) high glucose uptake; (b) increased glycolytic activity; (c) decreased mitochondrial activity for energy production; (d) low bioenergetic expenditure; (e) increased phospholipid turnover, altered lipid profile, and increase of de novo lipid synthesis; (f) increased amino acid transport along with elevated protein and DNA synthesis; (g) increased hypoxia; and (h) increased tolerance to reactive oxygen species (1,2). However, these general characteristics differ widely across cancer types and subtypes (1,2). Understanding how cancer subtypes derive energy and necessary building blocks, even from a nutrient-depleted environment, is of fundamental importance for the development of appropriate therapies and diagnostic approaches (2)(3)(4). Cancer metabolic phenotype has been explored for several years in cancer diagnostics performed by magnetic resonance spectroscopy as well as PET scanning. With better description of metabolic differences among tumor subtypes, these methods can be used for noninvasive subtyping of tumors and thus improved patient stratification. One of the major applications of noninvasive diagnosis and follow-up is in brain tumors. Metabolic profiling of brain normal and tumor samples has been performed for a number of years with early studies published more than three decades ago (Ref. 5 and references therein). Recent studies have further highlighted the possibility of diagnosing major brain tumor types in a noninvasive manner (6,7).
Glioblastoma multiforme (GBM) 4 is the most common form of malignant brain cancer in adults with very poor prognosis and a dire need for improvement in patient stratification and treatment. Several groups have turned to high dimensional profiling studies to better describe GBMs. Recent work performed as part of the Cancer Genome Atlas Network has analyzed genomic abnormalities in GBM samples (8). This analysis has identified four clinically relevant subtypes of GBMs that were characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1 among other alterations. Platelet-derived growth factor receptors (PDGFRs) are catalytic receptors that have intracellular tyrosine kinase activity. They regulate many biological processes including embryonic development, angiogenesis, cell proliferation, and differentiation. Isocitrate dehydrogenases (IDHs) function at a crossroad of cellular metabolism and are involved in lipid synthesis, cellular defense against oxidative stress, oxidative respiration, and oxygen-sensing signal transduction. IDHs catalyze the oxidative decarboxylation of isocitrate to ␣-ketoglutarate and the reduction of NAD(P)ϩ to NAD(P)H. Mutated IDH1 and IDH2 possess a different active site that appears to promote metabolism of ␣-ketoglutarate (product of wild-type IDH1 and IDH2) into the oncometabolite 2-hydroxyglutarate. Epidermal growth factor receptor (EGFR) is a transmembrane glycoprotein that binds to epidermal growth factor. Among its many functions, EGFR has been linked to the mechanism that reduces cell membrane permeability to choline (9,10). One of the most common oncogenic events observed in GBMs is the amplification and overexpression of EGFR kinase that occurs in 40 -60% of primary GBMs (11). Furthermore, a truncated version of this kinase (EGFRvIII) is expressed in 20 -30% of GBM patients and results in constitutive activation of this signaling pathway (12). The neurofibromin 1 (NF1) gene is suggested to act as a negative regulator of the Ras signaling transduction pathway (13). Gain of function for Ras, among other oncogenes, induces HIF-1 expression by deregulation of the mammalian target of rapamycin (mTOR) pathway (1). Abnormalities in all four of the aforementioned molecular targets can influence brain tumor cell metabolism, and this can possibly be observed through analysis of metabolic profiles.
High throughput metabolite profiling (metabolomics) provides unbiased analysis of metabolic differences. Metabolomics is particularly important in the search for biomarkers that can eventually be used for in vivo diagnosis and prognosis with methods such as magnetic resonance spectroscopy. Currently, magnetic resonance spectroscopy is the only available method for in vivo molecular analysis of brain tumors and is gaining an increasing role in clinical assessment of patients (6). The first step toward devising a metabolite-based noninvasive glioblastoma subtyping method is to identify metabolite-based subtypes and to determine major metabolic markers in cancer subtypes. Cell culture analysis provides an excellent, controlled, homogeneous setting for initial identification of metabolic profiles and is a prerequisite to the analysis of more complex tumor tissue or biopsy samples. NMR analysis of brain tumor cell cultures has been performed for number of years with some of the first detailed studies appearing in the early 1990s (14). Brain tumor cell cultures provide a way for creating a sufficient amount of sample through passaging of cells while avoiding problems caused by heterogeneity and confounders. In 1995, Florian et al. (7,15) used in vitro 1 H NMR spectroscopy and chromatographic analyses to compare metabolic properties of three types of human brain and nervous system tumor cell lines. Obtained spectra from meningiomas (six lines), neuroblastomas (three lines), and glioblastomas (five lines) displayed some similarities such as the presence of signals from leucine-, isoleucine-, valine-, threonine-, lactate-, acetate-, glutamate-, glycine-, and cholinecontaining compounds. Meningioma spectra featured relatively high signals from alanine. Intense signals from creatine were present in neuroblastoma but not in glioblastoma spectra. Statistically significant differences were found in the amounts of alanine, glutamate, creatine, phosphorylcholine, and threonine among the types of tumors examined. Overall, it has been observed that neuroblastoma, glioma, and meningioma cells display a low concentration of normal neural metabolites, such as N-acetylaspartate, ␥-amino butyrate (GABA), and taurine. The most discriminatory metabolites for tumor cell types include total creatine (creatine and phosphocreatine), total choline (phosphocholine, glycerophosphocholine, and choline), alanine, taurine, and glutamate.
In this work, we have included nine different glioblastoma cell lines and explored the hypothesis that GBM subtypes can be determined from metabolic profiles. To this end, we have analyzed in detail the specific metabolic characteristics of different glioblastoma cell lines and determined significant metabolic differences among metabolically resolved GBM subtypes. Metabolic profiles were compared with publicly available gene expression data to determine relationships between metabolic and gene expression differences among proposed subtypes of GBMs.

EXPERIMENTAL PROCEDURES
Cell Lines and in Vitro Culture Conditions-Human glioma cells A172, BS149, Hs683, LN18, LN229, LN319, LN405, U343MG, and U373 were maintained in DMEM supplemented with 10% fetal bovine serum (FBS) and antibiotics (Invitrogen). The "BS" series was generated at the University of Basel, whereas the "LN" series was generated by Erwin Van Meir in Lausanne, Switzerland. All cell lines were a kind gift of Adrian Merlo (Laboratory of Molecular Neuro-oncology, University of Basel, Basel, Switzerland). Prior to metabolite isolation, 1 ϫ 10 6 cells were seeded in quintuplicate in 10-cm culture dishes and incubated for 48 h at 37°C and 5% CO 2 . Cells were then harvested by scraping and rinsed with 5 ml of PBS. The mixture was centrifuged at 4,000 relative centrifugal force for 1 min. Supernatant was discarded, and the cell pellet was again rinsed with 5 ml of PBS. Upon centrifugation at 4,000 relative centrifugal force for 1 min, cell pellets were kept on ice for 5 min before being resuspended in 1 ml of ice-cold 50% acetonitrile. Cell suspensions were kept on ice for 10 min before centrifugation at 16,000 relative centrifugal force for 10 min at 4°C. The aqueous acetonitrile extract solutions were dried down under a stream of N 2 .
Total RNA Isolation and PCR Amplification-Total RNA isolation and subsequent cDNA synthesis from GBM cell lines were performed as described previously (16). The primers listed in Table 1 were designed and synthesized by Integrated DNA Technologies (Coralville, IA) ( Table 1). PCR was conducted using the EconoTaq PLUS (Lucigen, Middleton, WI) reagents as per the manufacturer's instructions. Cycles performed for amplification consisted of an initial step of 5 min at 94°C followed by the amplification steps of 94°C for 30 s, 56°C (EGFR, PDGFRA, and GAPDH) or 60°C (NF1 and IDH) for 30 s, and 72°C for 30 s (PDGFRA, NF1, IDH1, and GAPDH) or 90 s (EGFR); the final step was 72°C for 5 min. PCR products were separated on a 1% agarose gel. Fragments of ϳ100 -280 bp (PDGFRA, NF1, IDH1, and GAPDH) and ϳ1000 bp (EGFR) corresponding to the expected fragment length were obtained.
NMR Experimentation-The residue obtained after drying was dissolved in 0.6 ml of deuterium oxide (Aldrich, 99.96 atom % 2 H), pipetted into a 5-mm NMR tube for NMR analysis. All 1 H NMR measurements were performed on a Bruker Avance III 400 MHz spectrometer at 298 K. One-dimensional spectra were obtained using a gradient water presaturation method with 512 scans (pulse sequence zgesgp). NMR spectra were processed using Mnova with exponential apodization (exponent 1); global phase correction; Bernstein-Polynomial baseline correction; Savitzky-Golay line smoothing; and normalization using total spectral area as provided in Mnova. Spectral regions from 0 to 9 ppm were included in the normalization and analysis. Two-dimensional spectra including TOCSY, two-dimensional JRES, and HMQC using standard methods provided in TopSpin software (Bruker) were performed on one sample of each explored cell type with 70, 32, and 1,000 scans (number of transients per free induction decay, i.e. number of free induction decay acquired for each t 1 data point), respectively. Mixing times for TOCSY experiments were adjusted to 80 ms. For precision and reproducibility, a total of 45 NMR experiments were carried out in quintuplicate (n ϭ 5) on nine cell lines. Metabolite assignments were performed using two-dimensional measurements as well as one-dimensional 1 H spectral data in comparison with metabolites database data and literature information.
Data Analysis-Data preprocessing including data organization, removal of undesired areas, and binning as well as data presentation was performed with Matlab vR2010b (MathWorks). Minor adjustments in peak positions (alignment) between different samples were performed using in-house alignment software. 5 Principal component analysis (PCA) as well as fuzzy K-means cluster analysis were done using the Matlab platform as described previously (18). Feature selection was done with the significance analysis for microarrays (SAM) method (19). Peak assignment was performed using several methods developed in our group and elsewhere and was based on metabolic NMR databases (20 -22). Spectra for 41 metabolites used in quantification were obtained from the Human Metabolome Database or Biological Magnetic Resonance Databank and further analyzed visually and compared with the obtained spectra.
Metabolite Quantification-An automated method for quantification based on multivariable linear regression of spectra with appropriately aligned metabolite data from databases was developed and utilized in the study. The assumption behind this approach is that the spectrum of a mixture is the same as the combination (sum) of spectra of individual components measured under the same conditions. If there are no chemical interactions among compounds in the mixture, this assumption generally holds, and the spectra of standards can be used for assignment and quantification of spectrum of a mixture measured under the same conditions (pH, temperature, etc.) and with the same NMR pulse sequence. Here relative metabolite concentrations were estimated using nonlinear curve fitting with the multivariate least squares approach. The linear regression result was used as the starting point, and the model was constrained to concentrations: c Ϯ 0. NMR spectra of the mixtures (samples) are modeled as a sum of spectra for components (metabolites) in the mixture, i.e.
where c n is the concentration of the component, v is the spectral frequency point, and L n (v) is the spectrum of metabolite n at v. Each compound can result in many peaks within spectra of mixtures thus having extensive overlapping of peaks. The deconvolution of spectra of mixtures, such as in metabolomics, with many strongly overlapping lines, possibly with an unknown number of lines and atomic groups, each with a different line width, is extremely difficult, and thus, it is important to determine an optimal solver for this problem. Generally, the solution is found by minimizing the square root of difference between the model and real spectrum ʈI Ϫ I ʈ 2 while using c n and possibly minor frequency change as variables or by solving the Equation 1 for c n . Several different methods for multivariate regression analysis provided in Matlab vR2010b were tested including partial least squares regression, robust fit, Levenberg-Marquardt curve fitting, and linear programming. The best result, i.e. the model with the minimal error determined as ʈI Ϫ I ʈ 2 , was obtained with the Levenberg-Marquardt curve fitting, and this method was used for quantification of metabolic data used in further analysis. The Levenberg-Marquardt method is specifically designed for solving nonlinear curve fitting problems in the least squares sense and additionally allows the inclusion of constraint that concentrations have to be non-negative. Quantification error is estimated by performing the multivariate linear regression analysis as described above but with one metabolite at the time removed from the analysis. In this way, we are estimating errors caused by omitting metabolites from the analysis and the uniqueness of spectral features for metabolite quantification. From quantitative values for each metabolite recalculated for each n Ϫ 1 metabolite subgroup, error is calculated as 5 J. Hines and M. Cuperlovic-Culf, submitted.
where n ϭ 41 is the number of metabolites; L F i is the mean value of concentration for metabolite i across all samples obtained when the complete set of metabolites is analyzed; and L F M i,k is the same concentration but when metabolite k is excluded from the analysis.
Microarray Data Analysis-Microarray datasets used in this work were previously published and validated for accuracy in RNA expression measurements (Ref. 16; available from GEO Databases in MIAMI format ID: GSE 15824). The description of methods used for sample processing, microarray experimentation, and validation is given in the original publication. These microarray data were analyzed using TMeV2.2 and Pathway Studio 8.0 (Ariadne Inc.). TMeV is the general data analysis software that includes many features such as normalization, clustering, classification, and statistical analysis. Pathway Studio is the commercial software that determines relationships among different types of biological molecules as well as biomedical terms based on extensive literature searches.

RESULTS AND DISCUSSION
NMR metabolite analysis was performed on nine GBM cell lines. Five independent (biological) replicates of cells were grown under the same optimal conditions. Selecting cell cultures with the same growth conditions ensured that there were no differences in the metabolic profiles caused by cellular environment. The metabolites were extracted from replicates of each of the nine cell lines using the procedure described under "Experimental Procedures." One-dimensional 1 H NMR spectra for 45 metabolic extracts were measured and used in the qualitative and quantitative analysis. Additionally, TOCSY, two-dimensional JRES, and HMQC spectra were obtained for one replicate of each cell type, and these two-dimensional data aided in metabolite assignment. The one-dimensional 1 H NMR spectra obtained for each cell line and each replicate are shown in Fig. 1. Spectra of biological replicates show negligible differences.
One-dimensional and two-dimensional spectra allow detailed assignment of metabolites. Metabolite assignment was based on literature data (23) and database information (20, 21) using both one-dimensional and two-dimensional measurements. Obtained metabolite assignments as well as spectra for each metabolite are shown in Fig. 2. These assignments were used for metabolite quantification described under "Experimental Procedures." Metabolic profiles analysis was performed qualitatively, directly on the spectra, as well as quantitatively, on quantified metabolic concentrations. Sample spectra and quantified metabolite concentrations were analyzed with unsupervised clustering methods to determine the similarities and differences between sample types in an unbiased fashion.
In the first level of analysis, the complete spectra were normalized using the total peak area normalization method. Experimental conditions were equalized (in terms of pH and temperature), resulting in spectra with complete chemical shift overlap that was verified using our spectral alignment tool (17). Thus, binning was not necessary, and the analysis was performed on complete spectra, ensuring that all measured spectral features were considered. The qualitative analysis of the major variances in the spectra was performed directly by using PCA (Fig. 3) as well as by clustering samples with the hierarchical clustering (HCL) (Fig. 4A) and fuzzy K-means methods (FKM) (Fig. 4B). Fig. 3 shows the plot of principal components PC1, PC2, and PC3 for the five replicates of the nine cell lines studied. Grouping of replicates for each cell type is clear. In addition, there appears to be assembling of some cell types. Additional results of true clustering analysis are shown in Fig. 4. Fig. 4A presents sample clusters obtained using the HCL method. HCL allows feature grouping automatically from data without user input of cluster numbers. According to the HCL results, samples from the nine cell types can be grouped into four clusters. Finally, spectra were clustered using the FKM method, which was introduced to NMR metabolomics in Ref. 18. FKM calculates memberships or belongings of each feature (in this case sample) to each of the user-defined numbers of clusters. In this method, each feature belongs to some extent (defined by the membership value) to each group where a membership value near 1 indicates strong belonging and close to 0 indicates weak belonging or no belonging. Fig. 4B shows membership values for each sample for four clusters obtained using the FKM method. FKM memberships were calculated using the "fuzzification" factor m ϭ 1.8 (where m ϭ 1 defines crisp clustering), which provided the good balance between crisp and fuzzy clustering (24).
Cell replicates were co-clustered for all three methods. Although this can be expected, it still highlights the technical  consistency of the experimental manipulations and, even more importantly, biological control of the metabolism performed by cells of the same type under similar conditions. Different cell lines appeared to cluster within subtypes of GBMs with four distinct groups: group 1, LN229 and LN319; group 2, HS683 and LN405; group 3, U343, A172, and LN18; and group 4, U373 and BS149. The same results were obtained with all three unsupervised methods. FKM, and to some extent PCA, showed that LN405 co-clusters mostly with HS683 but has some similarities with U343, A172 and LN18 group as well. Other samples appeared to be strongly associated with only one cluster.
To determine specific metabolites that are distinct between these groups, we have performed metabolite quantification from spectral data. For this, we have utilized the multivariate linear regression fitting method, which tries to create models of data from metabolite (standard) spectra collected in databases. Metabolite spectra were manually aligned with the measurements to reduce errors caused by minor changes in peak positions. Relative metabolite concentrations were obtained as optimal multipliers for normalized spectra obtained in the fitting. Forty-one metabolites in total were used in the fitting. The list of analyzed metabolites and their spectra is shown in Fig. 2. Relative concentrations of each metabolite for each cell line as well as error estimates are shown in supplemental Table 1. HCL clustering of quantitative metabolic data following normalization is shown in Fig. 5. Comparison between PCA and FKM analysis for spectral and quantified data is shown in the supple-mental figures. Unsupervised analysis of metabolic data leads to the same sample groups as did analysis of spectral data.
The sample clustering obtained from quantified metabolite data was exactly the same as obtained with qualitative, spectral data. This shows that significant variances for sample groupings are preserved in the metabolite quantification procedure. Although there are some clear metabolic differences between the four groups visible from analysis of all metabolites (Fig. 5), we have performed an additional step of feature selection to determine metabolites with the most significant concentration changes. A selection of the metabolites with the most significantly different concentrations among these four groups of samples was performed using the SAM method included in the TMeV gene expression analysis software package. Although both SAM and TMeV were originally developed for the analysis of gene expression data, these tools are universal data analysis methods that can be used for both qualitative, i.e. spectral, or quantitative, i.e. peak, metabolomics data. Selected metabolites are shown in Fig. 6. Table 2 presents the lists of overconcentrated metabolites for each group of samples.
Our list of significant metabolites ( Table 2) includes some of the metabolites that are known to be significantly different in cancers relative to normal cells such as cholines, creatinines, and lactic acid. It is interesting to notice, however, that there are significant differences even in these major metabolites as well as many other metabolites between the four GBM subtypes. Group 1 samples have significantly higher concentrations of FIGURE 5. HCL clustering of quantitative metabolite data obtained using Levenberg-Marquardt multivariate linear regression method with spectral measurements for 41 metabolites from metabolomics databases with metabolite values normalized (divided by standard deviation and meancentered). Sample types are grouped similarly, based on these quantitative metabolic data, as they were with spectral data in Figs. 3 and 4. choline and its derivatives than all the other sample groups. This is highly significant as choline is already widely used as marker for in vivo brain tumor diagnosis and, although it is present in all tumors, it might make a more sensitive marker for Group 1 tumors. Interestingly, GBM cell lines included in this group, LN229 and LN319, also cluster based upon their molecular footprints as they strongly express PDGFRA at the transcript levels (PDGFRAϩ), but have low levels of EGFR transcript (EGFRϪ), as shown in Fig. 7. Choline is part of several pathways, most importantly metabolism of glycerophospholipids and other lipids. The analysis of the glycerophospholipid pathway shows overexpression of several important enzymes related to choline processing as well as genes involved in cho-line import into cells. Fig. 8A shows molecular transport genes that are, according to a literature search provided by the Pathway Studio software, related to choline and its derivatives. Clearly, several genes previously associated with the transport of choline are overexpressed in Group 1 cells. SCL44A (intermediate-affinity choline transporter-like protein), in particular, has been indicated as a major choline transporter in breast cancers with a connection to the Hsp90 chaperone protein (25). At the same time, EGFR, which is underexpressed in Group 1 cancers based on microarray as well as RT-PCR data (Fig. 7), has been associated with reduction of monolayer permeability to choline (9). A relationship between SCL44A and EGFR and choline metabolic profile has also been shown recently in metabolomics analysis of breast cancer cell lines (10). Furthermore, it should be kept in mind that EGFR mutation was identified as one of the major genetic differences between genomically determined GBM subtypes (8).
Group 1 also shows significant overconcentration of inositol. Inositol has been suggested as an age-independent marker in FIGURE 6. HCL clustering of metabolites selected as most differentially concentrated between the four groups. The feature selection was performed from quantitative metabolite data using SAM analysis. Prior to SAM analysis, metabolite values were normalized (divided by standard deviation and meancentered). SAM ⌬ value was 0.2 with zero median number of false significant metabolites. Outlined are sample and metabolite groups, which show the metabolites that are most significant for separation of samples for each group from all the other groups.  prostate cancers (26). In mammalian tissues and cells, inositol exists primarily in its free form or bound covalently to phospholipid as phosphatidylinositol. myo-Inositol is elevated in gliomas relative to normal brain tissue and is involved in osmoregulation and volume regulation (6). Inositol incorporation into phosphatidylinositol is part of glycerophospholipid metabolism and is catalyzed by inositol transferase (17). At the same time, inositol is transferred to the cytoplasm by several transporters such as sodium/myo-inositol cotransporter (SLC5A3) and solute carrier family 2 (SLC2A13). Differences in inositol levels between the four GBM subgroups can result from different levels of metabolite incorporation into larger molecules (making it more challenging to detect via NMR) or from differences in transport. Genes involved in transport (SLC5A3 and SLC2A13) and metabolism (inositol transferase) of inositol are shown in Fig. 8B with color corresponding to relative gene expression in group 1 relative to groups 2, 3, and 4 (subplot B1); group 2 relative to groups 1, 3, and 4 (subplot B2); group 3 relative to groups 1, 2, and 4 (subplot B2), and group 4 relative to groups 1, 2, and 3 (subplot B4). According to gene expression data shown in Fig. 8, subplot B1, overconcentration of inositol in group 1 is due to increased metabolite transport and decreased incorporation into larger NMR "invisible" molecules. In the other three groups, inositol is underconcentrated, and this is due to: decreased transport (group 2); rapid incorporation into a large molecule (group 3); or faster incorporation into a larger molecule as well as reduced transport (group 4).
Glutamine and glutamate are overconcentrated in several groups with glutamine overconcentrated in groups 1 and 3 and glutamate overconcentrated in groups 1, 2, and 3. Increased glutamine concentration in cancers has been reviewed by Dang (27). It was proposed that glutamine is transported into the cell through glutamine/amino acid transporter and converted to glutamate by glutaminase. Glutamate is catabolized to ␣-ketoglutarate for further oxidation in the tricarboxylic acid cycle. In Fig. 9, we show the genes connected in the literature with aspartate, glutamine, glutamate, and citrate as these metabolites show similar behavior across the groups of samples and are metabolically connected. Fig. 9 shows some of the major transporters for these metabolites across cellular and mitochondrial membranes. In group 1, aspartate, glutamine, glutamate, and citrate are overconcentrated. Expression levels FIGURE 8. Direct connection network between gene expression and metabolites that are overconcentrated in group 1. A, genes directly involved in the transport of choline (particularly SCL44A1) are overexpressed in group 1 cells. In the figure, genes that are overexpressed in Group 1 relative to the other groups are shown in red, and the ones that are underexpressed in group 1 are shown in blue. B, relation between inositol and its transporter genes (SLC5A3 and SLC2A13) as well as enzyme involved in its digestion (inositol transferase). In subplots B1-B4, coloring is used to describe gene expression difference across four cell line types. Subplot B1 shows expression of genes in group 1 relative to groups 2, 3 and 4; subplot B2 shows expression of genes in group 2 relative to groups 1, 3, and 4; subplot B3 shows expression of genes in group 3 relative to groups 1, 2, and 4; and subplot B4 shows expression of genes in group 4 relative to groups 1, 2, and 3. Inositol is overconcentrated in group 1, and this can be related to the overexpression of its transporters as well as underexpression of the digestion enzyme. CALCA, calcitonin-related polypeptide alpha; CNTF, ciliary neurotrophic factor; NISCH, nischarin; NTS, neurotensin; APP, amyloid beta (A4) precursor protein; CDP-DG, cytidine diphosphate-diacylglycerol.
of transporter genes in group 1 relative to all the other samples suggest that overconcentration of these metabolites results from the enhanced input of glutamine into the cell (by SLC38A1 and SLC7A8) as well as glutamate and aspartate (by SLC1A type transporters) and enhanced transport of citrate out of the mitochondria, which could lead to enhanced production of citrate due to the kinetics of reversible reactions in the TCA cycle. In group 2, the concentration of glutamine is reduced and, according to gene expression investigation, this can be ascribed to reduced transport as well as increased input into mitochondria (with OGDH gene overexpressed). This group also shows reduced levels of aspartate and glutamate transporters. However, their mitochondrial production is enhanced by higher expression of OGDH and SLC25A13. In group 3, all four metabolites are once again overconcentrated, and both major glutamine transporters are clearly overexpressed, but so are mitochondrial transporters, thus leading to higher throughput and higher concentrations of glutamate and aspartate. In group 4, all four metabolites are underconcentrated, and this can be connected to the fact that both major transporters for glutamine are underexpressed. Finally, as shown in Fig. 7, group 4 cell lines, U373 and BS149, also have the common characteristic of strongly expressing PDGFRA and EGFR at the transcript levels (PDGFRAϩ and EGFRϩ), further supporting this cluster.
A final example of the connection between metabolic results and gene expression measurements is the analysis of glycerol 3-phosphate (G3P) metabolism and related genes. G3P is an intermediate in triacylglycerol and glycerophospholipid metabolism, both of which are known to be altered in cancers. In our comparison of GBMs, G3P is overconcentrated in Group 4 cell lines. Fig. 10 shows part of the triacylglycerol and glycerophospholipid metabolic pathways where G3P is synthesized and degraded. Once again, there are changes in relative gene expression levels in the four groups of samples. In groups 1 and 2, all enzymes in these branches of pathways are underexpressed, leading to lower G3P levels. In group 3, triacylglycerol pathway production of G3P is reduced (gene underexpression). FIGURE 9. Direct connection network between gene expression and metabolites that are overconcentrated in some of the groups. Included are L-glutamine (overconcentrated in groups 1 and 3), L-glutamate (overconcentrated in groups 1, 2, and 3), L-aspartate (overconcentrated in groups 1, 2, and 3), and citrate (overconcentrated in groups 1, 2, and 3) with metabolite overconcentration highlighted in red. Genes are colored based on their expression in one group relative to all the others, i.e. panel 1 is expression in 1 relative to 2, 3, and 4; panel 2 is expression in 2 relative to 1, 3, and 4; panel 3 is expression in 3 relative to 1, 2, and 4, and panel 4 is expression in 4 relative to 1, 2, and 3. According to the changes in metabolite concentration and gene expression, we are hypothesizing major transporters for each metabolite listed, and this is outlined with red arrows on the graph. PC, phosphorylcholine.
However, glycerophospholipid metabolism is enhanced in G3P production as well as digestion, thus leading to reduction of G3P concentration as well as glycerol. Finally, in group 4, triacylglycerol and glycerophospholipid pathway production of G3P are enhanced. However, G3P digestion is reduced (by underexpression of glycerophosphate transacylase), and therefore, its concentration is increased.
Several genes known to affect cell metabolism show significantly different expression levels between the four groups of nine cell lines. Gene set enrichment analysis (28,29) of genes that are significantly overexpressed in each group relative to the other groups pointed out several metabolic processes that could be significantly affected by overexpressed genes. For example, genes overexpressed in group 1 include fatty acid synthase as well as several other genes from the fatty acid biosynthesis pathway. Group 2 overexpressed genes are significantly prevalent members of the amino acid metabolism pathway (Ser/Gly/Thr/Cys metabolism), and indeed, the majority of overconcentrated metabolites identified in this group are branched chain amino acids. Furthermore, there are clear expression level differences in PDGFRA, IDH1, EGFR, and NF1 (8) in our four metabolic groups of samples (Fig. 7). Out of these four genes, two are directly related to metabolite concentrations (EGFR and IDH1), and all four have been shown to affect several major genes involved in control of metabolism (such as HIF1A, mTOR, HRAS, and GLRX1).

CONCLUSION
We have performed unsupervised and supervised analysis of 1 H NMR measurements of nine GBM cell lines. Analysis was performed directly on spectral data and also on quantified metabolic data determined from the spectra. The presented analysis clearly shows that GBM cell lines have different metabolic profiles and that it is possible to determine groups of cell types based on NMR metabolomics. Spectral analysis established four groups of cancer types. For meta- FIGURE 10. Partial representation of triacylglycerol and glycerophospholipid metabolism related to synthesis and digestion of glycerol 3-phosphate. Glycerol 3-phosphate is overconcentrated in group 4 (highlighted in red). The gene colors represent relative expression levels in: panel 1, 2, group 1 relative to groups 2, 3, and 4 and group 2 relative to groups 1, 3, and 4 (same relative expression); panel 3, group 3 relative to groups 1, 2, and 4; and panel 4, group 4 relative to groups 1, 2, and 3, where red shows overexpression and blue shows underexpression. The branch of the triacylglycerol pathway is circled. The presented genes are: a, monoglyceridase; b, 2-lysophosphatidylcholine acylhydrolase; c, glycerate kinase; d, glycerophosphate transacylase. Glycerol 3-phosphate is overconcentrated in group 4 cells, and this can be related to up-regulation of 2-lysophosphatidylcholine acylhydrolase and glycerate kinase, both involved in glycerol 3-phosphate synthesis, and down-regulation of glycerophosphate transacylase, which is involved in its digestion. LPA, lysophosphatidic acid.
bolically determined GBM subtypes, we have resolved the major metabolic differences. Metabolic markers for each group of samples were correlated with publicly available gene expression data for these cell lines. Metabolic analysis generally agrees with gene expression measurements, and it was possible to explain metabolomics results based on gene expression data. It is clear from this work that to make conclusions from gene expression analysis, it is necessary to look at networks and pathways rather than individual genes. Metabolomics thus provides an excellent way to focus on gene expression analysis and to highlight major changes in pathways that are regulating key cellular processes. Our results agree with recent studies in GBMs showing that different subtypes exist for these tumors and that they likely respond differently to treatments. This is in agreement with recent publications that have shown that glioblastomas have distinct subtypes based on both gene expression and genomics (8), and we have shown correlation between our metabolic subtypes and major genes obtained as markers for genomic subtypes. Metabolites identified can be used as noninvasive markers of these subtypes. Future work will focus on the analysis of tumor tissue samples and will investigate whether these GBM subtypes can be established there as well.