Genome-scale Reconstruction of Metabolic Network in Bacillus subtilis Based on High-throughput Phenotyping and Gene Essentiality Data*

In this report, a genome-scale reconstruction of Bacillus subtilis metabolism and its iterative development based on the combination of genomic, biochemical, and physiological information and high-throughput phenotyping experiments is presented. The initial reconstruction was converted into an in silico model and expanded in a four-step iterative fashion. First, network gap analysis was used to identify 48 missing reactions that are needed for growth but were not found in the genome annotation. Second, the computed growth rates under aerobic conditions were compared with high-throughput phenotypic screen data, and the initial in silico model could predict the outcomes qualitatively in 140 of 271 cases considered. Detailed analysis of the incorrect predictions resulted in the addition of 75 reactions to the initial reconstruction, and 200 of 271 cases were correctly computed. Third, in silico computations of the growth phenotypes of knock-out strains were found to be consistent with experimental observations in 720 of 766 cases evaluated. Fourth, the integrated analysis of the large-scale substrate utilization and gene essentiality data with the genome-scale metabolic model revealed the requirement of 80 specific enzymes (transport, 53; intracellular reactions, 27) that were not in the genome annotation. Subsequent sequence analysis resulted in the identification of genes that could be putatively assigned to 13 intracellular enzymes. The final reconstruction accounted for 844 open reading frames and consisted of 1020 metabolic reactions and 988 metabolites. Hence, the in silico model can be used to obtain experimentally verifiable hypothesis on the metabolic functions of various genes.

Bacillus subtilis has been the organism of choice for the production of several important industrial products, including antibiotics, enzymes, nucleosides, and vitamins. Several aspects of the biochemistry, genetics, and physiology of B. subtilis have been studied extensively making B. subtilis the best characterized prokaryote second only to Escherichia coli (1,2). In addition various "-omics" data sets such as transcriptomic (3), proteomic (4), and metabolomic (5) are available for B. subtilis. This wealth of experimental data enables the development of a genome-scale metabolic in silico model that can be used not only for quantitative interpretation and structured integration of such extensive data sets but also as a tool for hypothesis generation and engineering of B. subtilis metabolism.
To date, constraint-based reconstruction and analysis (COBRA) 4 of cellular metabolism has been employed successfully to develop organism-specific genome-scale in silico models that have enabled numerous applications (6). Unlike other modeling strategies such as kinetic (7), stochastic (8), and cybernetic (9) methods, the COBRA approach does not attempt to compute precisely what a biochemical network does; rather, it seeks to distinguish between the network states that are achievable from those that are not, based on a detailed reconstruction of metabolism and incorporation of physiological parameters that are consistent with known experimental information. The COBRA approach is based on the successive imposition of governing physicochemical constraints on genome-scale reconstructions. The COBRA approach has been employed to generate genome-scale in silico models from each of the three major domains of the tree of life: archaea (i.e. Methanosarcina barkeri (10)), bacteria (i.e. E. coli (11)), and eukarya (i.e. Saccharomyces cerevisiae (12)).
For B. subtilis, a stoichiometric model consisting of around 35 reactions in the central metabolism and the purine metabolism has been constructed previously (13,14). The model has been used to obtain information on the intracellular flux distribution based on fractional carbon isotope labeling (15) and for the estimation of the P/O ratio (16) and the prediction of maximum theoretical yields of commercial biochemical products (14). Here we describe the reconstruction of a highly detailed and validated genome-scale metabolic network for B. subtilis, which reconciles established genomic and physiological data with high-throughput phenotyping data generated herein. The process used for the reconstruction illustrates the iterative nature of model development. Furthermore it demonstrates how the model can be used to highlight conflicts and discrepancies between various experimental data sets, and design experiments to resolve any gaps in our understanding of cellular metabolism, thereby accelerating biological discovery.

EXPERIMENTAL PROCEDURES
Metabolic Network Reconstruction-The reconstruction procedure followed the approach outlined in Edwards and Palsson (17). The reconstruction was carried out in SimPheny (Genomatica, Inc., San Diego, CA), a platform for development of cellular metabolic models. Lists of the genes, proteins, and reactions, corresponding literature information, definitions of the metabolite abbreviations, and a list of the exchange fluxes are described in the supplemental material (Bacillus subtilis Model Excel file). All the gene-protein-reaction (GPR) associations are provided in a GPR Associations zip file as JPEG images, which can be obtained from the authors, and a detailed document describing how to interpret these images is available elsewhere (11).
Biomass Composition-An equation for biomass formation was developed to account for the drain of precursors and building blocks into biomass. Biomass synthesis was incorporated as a linear combination of six macromolecular components (protein, DNA, RNA, lipid, lipoteichoic acid, and cell wall components) along with ions and metabolites, which were considered to account for the overall biomass composition. Lipoteichoic acid is a polymer of glycerolphosphate 24 -33 units long, linked by phosphodiester bonds with a 3-gentiobiosyldiglyceride as the membrane anchor, and protrudes through the cell wall (18). The detailed calculation of the biomass composition is provided in the supplemental Biomass Composition Word file.
In Silico Computations-The metabolic capabilities of the B. subtilis network were calculated by using flux balance analysis and linear optimization (17). For growth simulations, biomass synthesis (see above) was selected as the objective function to be maximized, and optimization was solved using Linear Programming techniques in the SimPheny platform. In addition to the metabolic reactions, reversible exchange reactions for all external metabolites were also included in the simulations to allow external metabolites to cross the system boundary. All flux values during simulation are in millimoles/g cell/h. For simulation of aerobic growth on minimal media, the following external metabolites were allowed to freely enter and leave the network: K ϩ , Na ϩ , Mg 2ϩ , Ca 2ϩ , Fe 3ϩ , CO 2 , H 2 O, and H ϩ (exchanges fluxes, Ϫ1 ϫ 10 6 to 1 ϫ 10 6 mmol/g cell/h). Aerobic conditions were simulated by setting the lower and upper limits for the O 2 exchange flux to Ϫ1 ϫ 10 6 mmol/g cell/h and 0, respectively. All other external metabolites, except for the substrates tested, were only allowed to leave the system (exchanges fluxes, 0 to 1 ϫ 10 6 mmol/g cell/h). Growth on different substrates was simulated by allowing corresponding external metabolites to enter the system with a maximum uptake rate of 5 mmol/g cell/h if not specified (exchange flux, Ϫ5 to 1 ϫ 10 6 mmol/g cell/h). For simulation of growth on complex Luria-Bertani (LB) medium, the chemical composition of LB medium was approximated based on typical analysis of the yeast extract and Tryptone provided by the manufacturers (see supplemental Complex Medium Composition Word file). The threshold of specific growth rate for cell viability was assumed to be 10 Ϫ8 h Ϫ1 . All simulation conditions are detailed in the supplemental Simulation Conditions Excel file.
Biolog Phenotyping Experiments-B. subtilis 168 (ATCC 23857) was obtained from the American Type Culture Collection (ATCC). Biolog's Phenotype MicroArray TM technology (19) was used for the phenotypic analysis of B. subtilis. It permits assays of 190 carbon (PM1-and PM2A-Microplates), 95 nitrogen (PM3B-Microplate), and 59 phosphorus and 35 sulfur source (PM4A-Microplate) utilizations at once. A defined medium containing 25 mM sodium pyruvate, 25 mM glucose, 5.0 mM NH 4 Cl, 2.0 mM NaH 2 PO 4 , 0.25 mM Na 2 SO 4 , 100 mM NaCl, 30 mM triethanolamine HCl (pH 7.1), 0.05 mM MgCl 2 , 1.0 mM KCl, 1.0 mM FeCl 3 , and 0.01% tetrazolium violet was used for the Phenotype Microarrays TM tests. The (PM) plates contained various carbon, nitrogen, phosphorus, or sulfur sources, which are omitted from the defined medium. The microplates were incubated at 37°C, and the absorbances were measured both at 570 and 750 nm after 24-and 48-h incubations, respectively, resulting in a total of 8 measurements. The detailed experimental conditions were described elsewhere (20). The colorimetric assay was considered as positive when the absorbance corresponding to the reduced dye (indicating substrate utilization) was 1.2 times higher than that of negative control. This threshold was chosen based on the calculation of standard deviations of the absorbance of the reduced dye in the negative control wells (S.D. was ϳ19%). The confidence levels for the data were determined based on the number of positive reactions out of the 8 measurements: high-confidence (growth, 8; no growth, none), medium-confidence (growth, 6 or 7; no growth, 1 or 2), and low-confidence (3 to 5). Low-confidence data were used for in silico analysis only if corresponding experimental data were available from the literature. Lists of the compounds, confidence level on data, predicted growth rates, and corresponding literature information are detailed in the supplemental Table S1.

RESULTS
Metabolic Reconstruction-The overall iterative model development procedure of genome-scale metabolic reconstruction for B. subtilis was based on established reconstruction methods augmented with high-throughput substrate utilization experiments and large-scale gene essentiality data sets (Fig. 1). First, a B. subtilis metabolic network (model v1 in Fig. 1) was reconstructed based on the annotated genome sequence (21) and available biochemical data (1, 2) containing 940 reactions. The metabolic network was expanded in size to 1015 unique reactions (model v2 in Fig. 1) based on high-throughput Biolog phenotyping data (see "Experimental Procedures") and network-based gap analyses. Metabolic model v2 computed the qualitative outcome of substrate utilization data with 74% agreement (200 in 271 cases). Finally, metabolic model v2 was compared with a large-scale gene essentiality study (22), and this analysis led to the addition of five metabolic reactions to model v2 . This expanded model could qualitatively predict the effect of gene deletions on growth with 94% accuracy (720 in 766 cases) (model v3 in Fig. 1, and see details in the following sections).
Characteristics of the Reconstructed Network-The characteristics of the reconstructed B. subtilis metabolic network are detailed in Fig. 2. The completed reconstruction (metabolic model v3 ) accounted for 844 open reading frames (ORFs) and consisted of 1020 reactions and 988 metabolites ( Fig. 2A). The details of the list of genes, reactions, and the GPR associations in the reconstruction are available as supplemental information. The functional classification of the 844 ORFs included in the reconstruction is summarized in Fig. 2B. The majority of the ORFs are associated with the transport of metabolites indicating the abundance of various classes of transporters in B. subtilis. 191 of 244 transporters included in the reconstruction were identified based on genomic and biochemical information, but 53 were based on high-throughput Biolog phenotyping and gene-essentiality analyses. For example, Biolog phenotyping indicated that D-malic acid, L-arabitol, glyoxylic acid, or dulcitol could be utilized by B. subtilis as a carbon source, and its corresponding transporter was incorporated into the biochemical-reaction network (see supplemental Tables S2 and S3 for lists of transporters).
The basic capabilities of the in silico model to predict quantitatively aerobic growth on various carbon sources were determined (Fig. 2C). A growth demand function was formulated based on biomass composition detailing the required metabolites in the appropriate ratios (see supplemental Biomass Composition Word file). This demand function was used as the objective function for flux balance analysis analysis. The growth rates for different substrate uptake rates were computed based on minimal media (see "Experimental Procedures" for simulation conditions) and compared with the experimental results of aerobic chemostat cultivation (13,23). Under these conditions, the model used the caa 3 terminal oxidase activity as part of the optimal flux distribution and resulted in higher in silico growth rates than experimental results (Fig.  2, C and D). However, when the flux through caa 3 oxidase reaction was restricted to zero under the same conditions to reflect the experimentally observed repression of caa 3 oxidase in the presence of glucose (see below), the in silico model could correctly compute the experimental growth rates of B. subtilis. In B. subtilis, four different terminal oxidases are known to exist, which catalyze the transfer of electrons to O 2 with different energy-coupling efficiencies (16,24) (Fig. 2D). Three oxidases accept electrons from menaquinone and one oxidase from cytochrome c. The main electron flow in glucose-grown B. subtilis is via the aa 3 oxidase (25), whereas the bd oxidase appears to be relevant mainly at low O 2 tension (26). The ythA and ythB genes of B. subtilis are likely to encode a terminal oxidase related to the bd-type oxidases (24), and, in this study, the H ϩ /e Ϫ stoichiometry was assumed to be 1.0. The cytochrome c branch, which contains the energy-coupling menaquinone:cytochrome c oxidoreductase (bc complex; H ϩ /e Ϫ , 1.5) and caa 3 oxidase (H ϩ /e Ϫ , 2), is most energetically favorable. However, the expression of caa 3 oxidase-encoding ctaCDEF genes is inhibited by catabolite control protein A, and therefore, the flow of electrons via cytochrome c branch is undetectable in glucose-grown B. subtilis (27). This example highlights the utility of the integrated modeling and experimental approach in uncovering potential novel regulatory interactions that affect cellular metabolism.
Analysis of High-throughput Substrate Utilization-The in silico computations and high-throughput phenotyping data from Biolog's Phenotyping MicroArray TM technology (Biolog Hayward, Inc., CA (19)) were compared (Fig. 3). 271 of 379 substrates tested (190 for carbon, 95 for nitrogen, 59 for phosphorus, and 35 for sulfur sources) were identified as data with sufficient confidence (see "Experimental Procedures" for determining the confidence) and were analyzed for consistency using the B. subtilis model. In the case of nitrogen and sulfur sources, obtaining high confidence data was difficult due to the high background of negative control, and 44% (42 in 95) and 49% (17 in 35) of the compounds tested could not be used for in silico analyses, respectively. A list of substrates, confidence lev- els of data, model refinements, predicted growth rates, and literature information for corresponding substrates, are described in supplemental Table S1. Growth on substrates was simulated by fixing its specific uptake rate at 5 mmol/g cell/h under aerobic conditions based on minimal media (see "Experimental Procedures" for simulation conditions). However, it should be noticed that the assumption of utilization rate of substrate might affect growth prediction (see below). The network (metabolic model v1 in Fig. 1), which was developed only based on genomic and biochemical information, could predict qualitatively the outcomes of Biolog data with 52% accuracy (140 in 271), but, after network expansion and metabolic gap analysis, overall prediction accuracy (metabolic model v2 in Fig.  1) was considerably improved to 74% (200 in 271) (Fig. 3A). However, 14 disagreements (7 for carbon, 4 for nitrogen, and 3 for sulfur sources) were observed (Fig. 3B), and out of these 14, 8 cases were compared with experimental growth data available in the literature by Medline search for the corresponding substrates (Fig. 3C).
Biolog phenotyping result and Mortl et al. (29) indicated together that glycine could not be utilized by B. subtilis as a sole carbon source, but the model showed growth, although the predicted growth was low at 0.036 h Ϫ1 for a specific uptake rate of 5 mmol/g cell/h. However, it should be noticed that for a lower specific uptake rate of 1 mmol/g cell/h no growth was predicted. Glycine is converted to glyoxylate, hydrogen peroxide, and ammonia by glycine oxidase (30). Hydrogen peroxide is toxic, and its production during glycine fermentations at 25 mM (Biolog phenotyping) and 20 mM (29) might also inhibit cell growth of B. subtilis. For other cases, relatively higher growth rates were predicted and accordingly presented in supplemental Table S1 without further computations. Direct comparison of present Biolog results and growth data from literature is difficult, because fermentation conditions, including medium composition, can affect significantly cell growth. In addition, in many reports the experimental conditions have been poorly defined. Nevertheless, for the remaining cases, in silico predictions generally agreed with experimental data from the literature but not with the phenotype micro-arrays. For example, B. subtilis does not have a glyoxylate shunt, a pathway that allows acetate to be used as a sole carbon source (31). Bryan et al. (32) reported that no growth was observed for B. subtilis with acetate as a sole carbon source. The metabolic network could produce NADH and ATP (namely aerobic respiration), but not biomass from acetate. In the high-throughput phenotype data, all of eight measurements for acetate indicated positive growth compared with that of the negative control (see supplemental  Table S1). Biolog phenotyping system is based on colorimetric assay utilizing chemical reduction (purple color formation) of tetrazolium dye by NADH as a reporter system (19). However, NADH is an important electron carrier both in respiration and  Fig. 1). A, statistics. (All but one of the reactions incorporated in the model are elementally and the charge is balanced. Pyridoxal-5Ј-phosphate synthase has not been fully characterized biochemically (28), so assumptions had to be made about the participating metabolites. However, it should be noted that this enzyme reaction was not used for the growth of B. subtilis networks under fermentation conditions tested.) B, functional classification of metabolic genes included in the model. C, prediction of growth rates with different carbon sources under aerobic culture conditions. D, schematic of aerobic respiratory pathways in B. subtilis (see text for details). v glc , v ac , v cit , v glcn-D , and v actn-R are the specific utilization rates of glucose, acetate, citrate, gluconate, and acetoin, respectively, which were obtained from graphical data or from experimental linear relationship versus specific growth rates of references (see Fig. 5  cell growth events, and cell respiration can occur independent of cell growth. The growth predictions of the in silico model for the Biolog substrates evaluated as low-confidence data were also compared with the data available in the literature by Medline search for the corresponding substrates. For 18 out of the 108 lowconfidence cases, corresponding experimental growth data were available from biochemical literature, and the model could correctly predict the utilization of these substrates in 94% accuracy (17 of 18 cases) (Fig. 3D). Incorrect prediction for D-serine was further analyzed. Fisher and Debarobuille (33) indicated no growth with D-serine as a sole nitrogen source, although the growth condition was not defined, the model simulations suggested the growth was possible. In the biochemical network, dsdA gene was assumed to encode D-serine deaminase (EC 4.3.1.18) based on a sequence similarity, which is necessary for utilizing D-serine as a sole nitrogen source (2). Therefore, the functionality of dsdA gene product or toxic effect of D-serine on cell growth (34) could be the subject of further experimental investigations. The examples of glycine and serine utilization illustrate the capability of the COBRA approach to pinpoint potential gaps in the understanding of metabolism and to guide experimental design.
As a result of the high-throughput substrate utilization experiments, the B. subtilis network was expanded in size from 940 to 1015 unique reactions through the use of network gap analysis to identify additional reactions that would reconcile the model with the experimental data. Network gap analysis first finds metabolites that are either consumed only or produced only, and therefore, violate the material balance on the metabolites. Then the candidate reactions that could fill the gap are identified by analyzing the online pathway data base such as Kyoto Encyclopedia Genes and Genomes (KEGG) and/or biochemical literature, and metabolic pathway(s) on the basis of physiological information such as Biolog phenotyping, could be completed (6). This analysis resulted in the identification of 75 reactions in model v2 (4 for amino acids, 16 for carbohydrates, 1 for cell wall, 3 for nucleotides, 2 for phosphate, and 49 for transporters; see supplemental Table S2 for details). Note that these reactions do not have genetic or biochemical evidence but would be required to sustain growth on corresponding substrates. The protein sequences from other organisms where the protein associations with the reaction are characterized were used in BLASTP searches (39) of the B. subtilis genome SubtiList data base (40) to identify candidate ORFs that may have the corresponding functions. The conserved domain(s) and sequence of ORF product identified were also compared against source protein in an NCBI data base (www.ncbi.nlm.nih.gov) and using a ClustalW program (41) at European Bioinformatics Institute, respectively. These analyses Results are scored as ϩ or Ϫ meaning growth or no growth determined from in vivo/in silico data. The n indicates that the corresponding pathway could not be included in the B. subtilis network due to unknown pathways or no growth. C, comparison of the incorrect predictions (ϩ/Ϫ and Ϫ/ϩ cases in B) to published experimental results. D, comparison of in silico predictions to published experimental results for the Biolog substrates identified as low-confidence data. The Biolog data were considered as low confidence growth when the inference of growth was difficult, because only 3 to 5 of the 8 measurements showed positive reactions compared with negative control. 1 , from Biolog phenotyping; 2 , from literature; 3 , in metabolic network, very slow growth (0.036 h Ϫ1 ) was predicted at the specific uptake rate of 5 mmol/g cell/h. lead to the assignment of putative function to several genes (Table 1). Specifically, Biolog phenotyping indicates that B. subtilis can utilize various substrates such as D-malic acid, L-arabitol, glycolic acid, glyoxylic acid, or dulcitol as a sole carbon source. Thus, the pathways that correspond to the utilization of these carbon substrates have to be present. These putative assignments represent hypothesis obtained based on the integration of the model and high-throughput phenotyping analysis and could be the subject of further experimental investigation to characterize the ORFs.
Analyses of Genetic Deletions and Metabolomic Data-The impact of single-gene deletions on growth in B. subtilis was analyzed by simulating its growth on rich medium (see "Experimental Procedures" for determination of chemical composition) under aerobic conditions, where the reaction corresponding to the deleted gene was removed. The in silico results were compared with large-scale experimental gene essentiality data as supplied by the Bacillus subtilis genome data base (22). Detailed analysis of the incorrect predictions resulted in the addition of five reactions to the metabolic network (one for lipid metabolism and four for transporters; see supplemental Table S3 for details). The in silico computations were in qualitative agreement with experimental results with 94% accuracy (720 in 766) ( Table 2). The complete list of genes analyzed is presented in supplemental Table S4. The incorrect predictions, 18 false-negative (in vivo, growth; in silico, no growth) and 28 false-positive (in vivo, no growth; in silico, growth), were further analyzed. In the 18 false-negative predictions (supplemental Table S5), 12 cases were due to the definition of the in silico biomass function used for objective function of flux balance analysis, which was developed based on the wild-type B. subtilis (see "Experimental Procedures" for biomass composition).  (22) a From the KEGG compound database. b Gene names are based on the SubtiList database. c Suggested annotation arose from analyzing the network gaps in model v2 in Fig. 1. d E-values were obtained at the protein sequence level based on BLAST 2 alignment of sequences obtained from SubtiList and NCBI databases. The conserved domain(s) and sequence of ORF product identified were also compared against source protein in a NCBI database and using a ClustalW program at EBI, respectively (see supplemental Figs. S1 and S2). e YvgN protein was expressed in E. coli and showed glyoxal reductase (EC 1.1.1.78) activities for glyoxal, methylglyoxal, glyoxylate, glyceraldehyde, and glyceraldehyde 3-phosphate (43). However, it should be noticed that aldose reductase generally have a broad substrates specificity, including arabitol, glyoxal, methylglyoxal, and glyceraldehyde (BRENDA database). f The tartronate semialdehyde reductases are a structurally and mechanistically related family of enzymes that includes 3-hydroxyisobutyrate dehydrogenase, 6-phosphogluconate dehydrogenase, and a novel D-phenylserine dehydrogenase (44). g In the present model, yfjR was assumed to encode 3-hydroxyisobutyrate dehydrogenase from sequence evidence. However, the functionality of this gene product needs to be verified experimentally.

TABLE 2 Impact of single-gene deletions on growth in B. subtilis
For 766 of 844 metabolic genes included in the model, experimental gene deletion data were available from the Bacillus subtilis Genome data base (22), and the deletions of these genes were analyzed by simulating their growth on rich medium. These metabolic genes (pssA, psd, yfiX, ugtP, gtaB, dltABCD, ggaA, ggaB, and tagE) encode enzymes catalyzing the generation reaction of lipids and cell wall components, which are not essential for cell growth under normal conditions (22,(45)(46)(47)(48). However, because the biomass composition defined in silico required the synthesis of these reactions, the corresponding genes were identified as lethal deletions. The remaining six cases (ndk, bkdAA, bkdAB, bkdB, ipdV, and yubB) appear to be due to metabolic gaps in the present biochemical network and/or the availability of external sources of the biomass precursors from rich media (22,49). This could be subject of further experimental investigation. For the 28 false-positive disagreements (supplemental Table S6), 15 cases could be explained by accumulation of toxic compounds induced by gene deletions (tpiA and yueK), disruption of other functions such as pH homeostasis maintenance (mrpABCDF), and formylation of methionyl-tRNA (fmt), conditional essentiality (pgk), or regulation effects (racE, fbaA, murAA, yumC, and trxB) (22,(35)(36)(37), most of which are not yet predictable by the standard flux balance approach. metK, ytaG, and acpS products are related to syntheses of siroheme, coenzyme A, and acylcarrier protein, respectively, which might be necessary for cell viability. The remaining 11 false-positive cases were related to carbohydrate (pdhA, odhB, tkt, and pfkA) and nucleotides and nucleosides (guaB, hprT, pyrG, cmk, nrdE, nrdF, and ymaA) metabolism but could not be explained by the in silico model. First, the enzymes revealed as essential might have additional unexpected roles in the cell. For example, both 2-oxoglutarate (OdhA, OdhB, and PdhD) and pyruvate (PdhD, PdhAB, and PdhC) dehydrogenase are composed of three components, but only odhB and pdhA (␣-subunit of PdhAB) genes were essential. hprT, a gene from the purine salvage, was also essential. Second, the model might need to account for additional features of kinetic or transcriptional regulation. It should also be noticed that subtle differences in the experimental conditions and the strain background could induce the difference between lethal and a very slow growth mutation (22). The in silico metabolites of the metabolic network were also compared with experimental metabolomic data for B. subtilis grown in a moderately rich medium (5). 350 of 1692 compounds separated have been positively identified using standards (see supplemental Table 1 of Ref. 5) and were analyzed for consistency using the model. The biochemical network could only account for 160 (46%) of the 350 compounds identified (supplemental Table S7). This appears to be due to metabolic gaps in the present model. 270 (77%) of the 350 compounds could be referenced by the online KEGG compound data base and/or Chemical Abstracts Service number, but the detailed metabolisms except for the in silico metabolites of the network have not been reported. The remaining 613 unique in silico metabolites could be the subject of further metabolomic investigations.

DISCUSSION
Herein, we have developed a genome-scale metabolic model of B. subtilis, a Gram-positive organism that is of both scientific and commercial importance (1,2). The model-development process is iterative by nature and moves from the reconstruc-tion of a genome-scale metabolic network to a fully functional in silico model that can be used to compute phenotypes. First, a metabolic reconstruction was developed that reconciles the genome sequence and annotation together with the known biochemistry and physiology of the organism. This reconstructed network was validated for consistency against newly generated high-throughput substrate phenotyping data and existing large-scale gene essentiality data. The model was subjected to iterative refinement based on the identified inconsistencies, leading to additional reactions incorporated in the network and other modifications to the model content. The net result was a biochemically and genetically detailed in silico model that consists of 1020 metabolic reactions that are catalyzed by 844 ORFs (ϳ20% of the genes in B. subtilis). For B. subtilis, a stoichiometric model consisting of around 35 reactions in the central metabolism and the purine metabolism has been constructed previously (13,14). However, with the genome-scale model, it was possible to compute not only quantitatively aerobic growth of B. subtilis, but also qualitatively its phenotypic behaviors for substrate utilization in 74% agreement (200 in 271 cases) and the outcome of single-gene deletions with 94% agreement (720 in 766 cases). Hence, the in silico model could represent a framework wherein a variety of heterogeneous data types are integrated and analyzed in a structured format (38) and be subject to further investigations (see below).
The computation of cellular functions from reconstructed networks allows for study of the relationship between environmental and genetic factors and their integrated function to orchestrate cellular phenotypes. The model development has also generated several experimentally verifiable hypotheses that could provide insight into the functioning of the large-scale metabolic network. An example of hypotheses generated in this model is the set of new annotation for metabolic genes based on network-based gap analysis and high-throughput phenotyping. Biolog phenotyping indicates that B. subtilis can consume various carbon substrates such as D-malic acid, L-arabitol, glycolic acid, glyoxylic acid, and dulcitol for cell growth. In Table 1, the reactions that are missing from these pathways and corresponding genes are identified based on network-gap and sequence analyses. Such reactions and genes should be validated with genetic and biochemical experiments to confirm their role in cellular metabolism. The second example is depicted in Table S5 in which the in silico deletions of the six metabolic genes (ndk, bkdAA, bkdAB, bkdB, ipdV, and yubB) are predicted to be essential; however, the growth of the corresponding mutants in vivo does not seem to be affected. This discrepancy indicates that there are possibilities of other genes in the genome that replace those specific reactions or the metabolites necessary for cell growth might be imported from the external source of rich media. Subsequently, we hope to see new experimental data generated that will support or refute these hypothesis, which will add further confidence to the network reconstruction or lead to content modifications. The model will continue to be iteratively refined further based on comparison with additional experimental data (Fig. 1). In this way the model can serve as an up-to-date representation of the cumulative knowledge of B. subtilis metabolic capabilities.
In addition to improving the model, this analysis presented herein also illustrates how constraint-based analysis of metabolic networks could be valuable to enhance the quality and utility of high-throughput phenotyping technology that can uncover novel metabolic capabilities (see Fig. 3). For example, the in silico model could predict correctly the cellular respiration event in B. subtilis utilizing acetate as a sole carbon source. Biolog's Phenotype MicroArray TM technology is based on NADH as a reporter system (19), but NADH is an important electron carrier both in respiration and cell growth events. The network could also predict the utilization in B. subtilis with an accuracy of 94% (17 of 18), for the substrates evaluated as lowconfidence data in Biolog phenotyping. The ability to tightly integrate computational modeling approaches with highthroughput experimental data will be important in our on-going quest to mathematically model and simulate complex biological functions, which represents a fundamental goal of systems biology.
Although there is significant progress on transcriptional and phenomic analyses with metabolic models, the integrated analysis of other high-throughput data types and/or simultaneous reconciliation of multiple data sets are in the early stages of development (6). In this study, the Biolog phenotyping dataset could be used effectively for the metabolic reconstruction of B. subtilis, and furthermore the latter could enhance the quality and utility of the former (see above). However, the in silico model could only predict 46% (160 in 350) of the experimental metabolomic compounds reported from the literature (5). At the present time, the annotation of the B. subtilis genome is incomplete, and ϳ40% of its ORFs do not have a functional assignment (2). Thus the genome-scale model developed here might lack some metabolic capabilities that B. subtilis possesses. Thus, the continued genome-wide experiments and comparisons with the model predictions will be an integral part in the further development of in silico B. subtilis strain and represents a key step in the application of systems biology to industrial biotechnology.