A Mathematical Model for the Branched Chain Amino Acid Biosynthetic Pathways of Escherichia coli K12

As a first step toward the elucidation of the systems biology of the model organism Escherichia coli, it was our goal to mathematically model a metabolic system of intermediate complexity, namely the well studied end product-regulated pathways for the biosynthesis of the branched chain amino acids L-isoleucine, L-valine, and L-leucine. This has been accomplished with the use of kMech (Yang, C.-R., Shapiro, B. E., Mjolsness, E. D., and Hatfield, G. W. (2005) Bioinformatics 21, in press), a Cellerator (Shapiro, B. E., Levchenko, A., Meyerowitz, E. M., Wold, B. J., and Mjolsness, E. D. (2003) Bioinformatics 19, 677-678) language extension that describes a suite of enzyme reaction mechanisms. Each enzyme mechanism is parsed by kMech into a set of fundamental association-dissociation reactions that are translated by Cellerator into ordinary differential equations. These ordinary differential equations are numerically solved by Mathematica. Any metabolic pathway can be simulated by stringing together appropriate kMech models and providing the physical and kinetic parameters for each enzyme in the pathway. Writing differential equations is not required. The mathematical model of branched chain amino acid biosynthesis in E. coli K12 presented here incorporates all of the forward and reverse enzyme reactions and regulatory circuits of the branched chain amino acid biosynthetic pathways, including single and multiple substrate (Ping Pong and Bi Bi) enzyme kinetic reactions, feedback inhibition (allosteric, competitive, and non-competitive) mechanisms, the channeling of metabolic flow through isozymes, the channeling of metabolic flow via transamination reactions, and active transport mechanisms. This model simulates the results of experimental measurements.


INTRODUCTION
Systems biology may be broadly defined as the integration of diverse data into useful biological models that allow scientists to easily observe complex cellular behaviors and to predict the outcomes of metabolic and genetic perturbations. As a first step towards the elucidation of the systems biology of the model organism, Escherichia coli, we have elected to limit our initial efforts to the development of a mathematical model for the complex but well-studied metabolic pathways for the biosynthesis of the branched chain amino acids, L-isoleucine, L-valine, and L-leucine.
The biosynthetic pathways for the branched chain amino acids are shown in Figure 1 (1-3). Lthreonine deaminase (TDA), the first enzyme specific for the biosynthesis of L-isoleucine, is end-product inhibited by L-isoleucine, and α-isopropylmalate synthase (IPMS), the first enzyme specific for the biosynthesis of L-leucine, is end-product inhibited by L-leucine. However, because the parallel pathways for L-valine and L-isoleucine biosynthesis are catalyzed by a set of bi-functional enzymes that bind substrates from either pathway, L-valine inhibition of the first enzyme specific for its biosynthesis catalyzed by a single α-acetohydroxy acid synthase (AHAS) could compromise the cell for L-isoleucine biosynthesis or result in the accumulation of a toxic metabolic intermediate, α-ketobutyrate (αKB). This type of a regulatory problem is often solved by using multiple isozymes with different substrate preferences that are differentially regulated by multiple end-products of parallel pathways. In this case, there are three AHAS isozymes that catalyze the first step of the L-valine and the second step of the Lisoleucine pathway (4). AHAS I has substrate preferences for the condensation of two pyruvate molecules required for L-valine biosynthesis, and is end-product inhibited by L-valine (4). AHAS III shows no preference for pyruvate or αKB. While this isozyme can produce intermediates for both L-valine and Lisoleucine, it is inhibited by L-valine (4). The AHAS II isozyme has substrate preferences for the condensation of pyruvate and αKB required for L-isoleucine biosynthesis, and it is not inhibited by any of the branched chain amino acids (4). However, AHAS II is not active in the E. coli. K12 strain (5).
Consequently, this strain cannot grow in the presence of high levels of L-valine unless L-isoleucine is also added to the growth medium (6).
TDA is an allosteric enzyme whose kinetic behavior can be described by the concerted allosteric transition mode of Monod, Wyman, Changeux , the MWC model (7,8). According to the MWC model, TDA can exist in an active state (R) or an inactive state (T) (8,9). The fraction of active enzyme in the R or T states is determined by the concentrations and relative affinities of substrate (L-threonine), inhibitor (L-isoleucine), and activator (L-valine) for the R and T states.
In addition to these regulatory circuits, the intracellular levels of the branched chain amino acids are influenced by the reversible transamination reactions of each pathway. When the intracellular levels of any of the end-product amino acids become high, reverse reactions to their cognate ketoacids are favored; for example, high concentrations of L-valine can be converted to α-ketoisovalerate (αKIV) to supplement L-leucine production. In turn, intracellular amino acid levels can be affected by their active transport from the extracellular growth medium. Therefore, the enzyme reactions required for the active transport of the branched chain amino acids into the cell against a concentration gradient are included in our simulations.

EXPERIMENTAL PROCEDURES The Mathematical Model
Here we use kMech (10), a Cellerator (11) language extension that describes a suite of enzyme reaction mechanisms. Each enzyme mechanism is parsed by kMech into a set of fundamental associationdissociation reactions that are translated by Cellerator into ordinary differential equations (ODEs) that are numerically solved by Mathematica TM (10). To build a model for a metabolic pathway, users need only to string together appropriate kMech enzyme mechanism models and to provide the physical and kinetic parameters for each enzyme. The development of approximation methods for estimating unavailable model parameters, such as forward and reverse rate constants (k f , k r ), from kinetic measurements (K m, k cat ) are described elsewhere (10).
The detailed kMech models for each of the pathway enzymes (Fig. 1)

Carbon Flow Channeling
Traditional modeling approaches use the Michaelis-Menten kinetic equation for one substrate/one product reactions, and the King-Altman method to derive equations for more complex multiple reactant reactions. These types of equations focus on conversion between metabolites (metabolic flux) rather than enzyme mechanisms. While metabolic flux provides valuable information about biomass conversions (12), it cannot simulate, for example, the pathway-specific regulation patterns that control carbon flow channeling through the three AHAS isozymes of the parallel L-isoleucine and L-valine pathways, and the final transamination reactions. This level of mathematical modeling requires a detailed understanding of enzyme kinetic mechanisms and regulatory circuits (Fig. 2) as described below.

α-Acetohydroxy Acid Synthase (AHAS) Isozymes
The AHAS isozymes are controllers of carbon flow into either the L-isoleucine or L-valine biosynthetic pathway. The Ping Pong Bi Bi enzyme mechanism of these isozymes, describes a specialized two-substrate two-product (Bi Bi) mechanism in which the binding of substrates and release of products is ordered. It is a Ping Pong mechanism because the enzyme shuttles between a free and a substratemodified intermediate state indicated as white and shaded ovals, respectively, in Figure 2.
Carbon flow through these isozymes is controlled by the affinities (K m ) of the enzyme intermediates for their second substrates as shown in Figure 2 (13). For example, the AHAS II enzyme reactions shown in Figure 1 are described by the following reactions: 2  The first reaction set is for the condensation of two pyruvate (Pyr) molecules for the biosynthesis of αacetolactate (αAL) of the L-valine and L-leucine pathways. The second reaction set is for the condensation of one Pyr molecule and one α -ketobutyrate (αKB) molecule for the biosynthesis of αacetohydoxybutyrate (αAHB) of the L-isoleucine pathway. As written above, the initial reaction of Pyr with free AHAS II to form the activated enzyme intermediate is represented twice. Therefore, if these reactions were modeled, two molecules of Pyr would be consumed instead of just one for each molecule of Pyr or α KB produced. This redundancy can be resolved by rewriting these reactions as: 2 3 3 In Mathematica TM , the Union operator is used in conjunction with the kMech PingPong models to solve this redundancy of pathways problem. Thus, for channeling of pyruvate through the AHAS II isozyme into L-valine or L-isoleucine, the user kMech inputs in Mathematica TM syntax are: These differential equations and variable definitions are passed to Mathematica TM where they are solved by the numeric solver (NDSolve) function and graphs of enzyme product vs. time are generated.
The Union operator also was used for the modeling of the L-valine inhibited AHAS I and AHAS III isozymes described in the supplemental data of the online version of this article. Detailed descriptions of other kMech models used in this simulation are published elsewhere (10).

Reversible Transamination Mechanism
The Leu → Since the first substrate reaction with glutamate (Glu) is the same for all three of the branched chain α−ketoacid second substrates, the Mathematica TM Union operator is once again used to eliminate this redundancy. Because transamination is reversible, kMech models must be entered in both reaction directions for each of the three branched chain amino acid transaminations, and again the Union operator is used to eliminate the duplicated second substrate reactions (TBNH 2 + aKG ⇔ TBNH 2 ·aKG TB + Glu) of each transamination (gray arrowed lines in Fig. 2 Leu + TB ⇔ TB·Leu → aKIC + TBNH 2 TBNH 2 + aKG ⇔ TBNH 2 ·aKG TB + Glu → These reactions are parsed by kMech into elementary association-dissociation reactions and passed on to Cellerator where they are processed as described above. The same method was used for modeling transaminase C (TC), a reversible Ping Pong Bi Bi mechanism enzyme that uses alanine as the amino donor for the transamination of L-valine (Fig. 2).

Allosteric Regulation
Threonine deaminase (TDA) is an allosteric enzyme whose kinetic behavior can be described by the concerted allosteric transition model of Monod, Wyman, Changeux, the MWC model (7,8).
According to the MWC model, TDA can exist in an active state (R) or an inactive state (T) (8,9). The fraction of enzyme in the R or T state is determined by the concentrations and relative affinities of substrate (L-threonine), inhibitor (L-isoleucine), and activator (L-valine) for each state. This model is described by two equations:

Approximation of Intracellular Enzyme Concentrations
With few exceptions, intracellular enzyme concentrations are not available. However, with careful experimental documentation, these concentrations can be approximated from the yields and specific activities of purified enzymes. For example, calculations based on purification tables in the literature suggest that the intracellular concentration of TDA is 4 µM (16). Furthermore, recent experiments have shown a positive correlation between mRNA levels measured with DNA microarrays and protein abundance in both E. coli (17) and yeast cells (18,19). Thus, the intracellular levels of the remaining enzymes of the branched chain amino acid biosynthetic pathway can be inferred from the calculated intracellular level of TDA and the relative mRNA levels of the other branched chain amino acid biosynthetic enzymes using DNA microarray data (20). The data in Table 1 of the Supplemental Data in the online version of this article demonstrate that this is a reasonable method. Indeed, simulations using intracellular enzyme concentrations inferred in this manner produce experimentally observed steady state pathway intermediate, and end-product levels (21,22), usually within two-fold to one-half adjustments of these inferred values.

Optimization of Model Parameters
A list of reported enzyme kinetic and physical parameters needed to solve the differential equations for the simulations reported here and their literature sources are available in Table 1 of the Supplemental Data in the online version of this article. The optimized values to simulate known steady state intracellular levels of pathway substrates, intermediates, and end-products are also listed for comparison. In brief, for each enzyme, there are at least three parameters needed: total enzyme concentration (E T ), K m for each substrate and k cat for each enzyme reaction. For enzymes with additional regulatory mechanism, extra parameters, such as K i for each inhibitor and K a for each activator, also are required.
In initial simulation, E T values were inferred from Microarray data as described above; K m and k cat values were obtained from in vitro enzyme kinetic data of purified enzymes with the exception of Transaminase C (TC) and α-Isopropylmalate Isomerase (IPMI) where empirical values were used due to a lack of experimental data. These values were manually adjusted to match the published in vivo steady state levels of intermediate and endproduct metabolites (21,22). Interestingly, the inferred E T and in vitro K m values work quit well, since the adjustments are usually within two-fold to one-half of the initial values. However, since many variables can influence in vitro measurements including the relative activities of purified enzymes, larger corrections were sometimes necessary for the estimation of k cat values (5 out of 9 enzymes). Once the mathematical model was optimized with the parameters reported in Table 1 of the Supplemental Data of the online version of this article, it was used without further adjustment for the simulations of metabolic and genetic perturbations reported below.

Amino Acid Biosynthetic Pathways of E. coli K12
The three interacting metabolic pathways simulated here consist of eleven enzymes, eighteen metabolic intermediates, and three enzyme cofactors. The mathematical model for this metabolic system consists of 105 ordinary differential equations (ODEs), with 110 association and dissociation rate constants, and 52 catalytic rate constants. The enzymes of these interacting pathways employ three distinct enzyme mechanisms (simple catalytic, Bi Bi, and Ping Pong Bi Bi) that are regulated by allosteric, competitive, or noncomptetitive inhibition mechanisms. As described in the Methods section, the physical parameters for these models have been obtained directly from the literature, calculated from data in the literature, or estimated by fitting experimental data (Table 1 of the Supplemental Data of the online version of this article). Relative intracellular enzyme levels have been inferred from enzyme purification and DNA microarray data (20).
The steady state levels for the thirteen pathway intermediates and end-products are shown in Figure 3. Steady-state enzyme activity levels were optimized to properly channel the steady-state flow of intermediates through these pathways to match reported in vivo levels of pathway intermediates and endproducts (21,22). The detailed kMech inputs, corresponding ODEs, kinetic rate constants, and initial conditions for solving the ODEs are presented in Figure 1 of the supplemental data available in the online version of this article.

Allosteric Regulation of L-Threonine Deaminase (TDA)
The allosteric regulatory mechanism of TDA was simulated with the MWC model employing physical parameters based on the literature or optimized to fit experimental data (10). The data in Figure 3 show that TDA produces αKB at a steady state level comparable to that observed in vivo (21,22). Since the K i for L-isoleucine (15 µM) is much less than the K a for L-valine (550 µM), an initial decrease in the production of αKB as L-isoleucine accumulates is followed by an increase to a final steady level that accompanies the accumulation of L-valine (Fig. 3). Correspondingly, the fraction of active TDA is initially decreased as L-isoleucine accumulates and countered to a steady level (5.5% of the total enzyme is in the active R state) while L-valine accumulates ( Figure 4A). A Similar pattern was observed for the fractional saturation of TDA with L-threonine (v 0 /V max ) in response to changes in the levels of its effector ligands, L-isoleucine and L-valine. At its steady state level, TDA is only about 1.2% saturated with Lthreonine ( Figure 4B).

Regulation of the α-Acetohydroxyacid Synthase (AHAS) Isozymes
The two-substrate, two-product, AHAS isozymes I and III employ a Ping Pong Bi Bi enzyme mechanism described in the Methods section [the AHAS II isozyme is inactive in E. coli K12 (5)]. The L-valine inhibition of AHAS I and III is noncompetitive and, in the case of AHAS III, is incomplete since 15-20% of the activity attained at saturating substrate concentrations (V max ) remains in the presence of saturating L-valine concentrations (13). The data in Figure 3 show that the production of αAL produced by AHAS isozymes I and III decreases as L-valine accumulates. These data also show that αAHB, primarily produced by AHAS isozyme III, decreases to a steady state level as its end-product inhibitor (Lvaline) accumulates, and as its substrate, αKB, decreases because L-isoleucine accumulates and inhibits TDA ( Figure 1).

Accumulation, not L-isoleucine Starvation
It is well known that adding L-valine at a final concentration of 1 mM to a culture of E. coli K12 cells growing in a glucose minimal salts medium inhibits their growth, and that this L-valine inhibition can be reversed by L-isoleucine addition (6). Since the AHAS I and AHAS III isozymes of E. coli K12 strains are inhibited by L-valine, and since the ilvG gene for AHAS II in E. coli K12 strains contains a frameshift mutation that destroys AHAS II activity (5), it was assumed that L-valine inhibition of AHAS I and AHAS III might inhibit growth by interfering with L-isoleucine biosynthesis. However, later studies demonstrated that the intracellular L-isoleucine level is not suppressed by L-valine because its biosynthesis is sustained, even at saturating L-valine concentrations, by AHAS III that remains 15-20% active (13,23) and by the L-valine activation of TDA that shuttles more substrate into the L-isoleucine pathway. Indeed, the simulation in Figure 5A shows that, in the presence of extra-cellular L-valine, the intra-cellular L-isoleucine level in fact accumulates nearly nine-fold ( Figure 5A). At the same time, the pathway precursor of L-isoleucine, αKB, increases about four-fold ( Figure 5B). This build-up of αKB is caused by L-valine activation of TDA ( Figure 5C) which increases its production, and L-valine inhibition of the AHAS I and AHAS III isozymes, which reduces its consumption. It is now known that this αKB accumulation is toxic to cells because of its ability to inhibit the glucose PTS transport system (24,25). Thus, as reproduced by our simulations, L-valine growth inhibition of Escherichia coli K12 is not a consequence of L-isoleucine starvation.
The simulation results in Figure 5B show that the growth inhibiting effects of L-valine induced αKB accumulation can be reversed by L-isoleucine by its ability to inhibit TDA activity. This simulation shows that, in the presence of 1 mM L-valine, the level of αKB increases around four-fold; and that in the presence of 500 µM L-isoleucine, αKB levels are reduced to the control level observed in the absence of L-valine. The simulation results in Figure 5C show that, concomitant with the rise in αKB observed in the presence of 1 mM L-valine, nearly 18% of the cellular TDA is converted to the active R state. However, concomitant with the decrease in αKB observed in the presence of 500 µM L-isoleucine, the cellular TDA in the active R state is reversed to the control level observed in the absence of L-valine. These simulations are verified by experimental results accumulated from multiple laboratories over a three decade period (6,24,25).

Simulating the Metabolic Engineering of an L-Isoleucine Over-producing E. coli K12
Strain.
An obvious goal of modeling biological systems is to facilitate metabolic engineering for the commercial production of specialty chemicals such as amino acids. In the past, this has been largely accomplished by genetic manipulation and selection methods. For example, a common strategy to overproduce an amino acid has been to isolate a strain with a feedback resistant mutation in the gene for the first enzyme for the biosynthesis of that amino acid. Here we use our model to determine the effects of a feedback resistant TDA for the over-production of L-isoleucine. We can simulate a TDA feedback resistant mutant strain (TDA R ) by increasing the K i for L-isoleucine to a large number, (e.g. 100,000 µM).
The simulation in Figure 6A shows that in the absence of L-isoleucine inhibition, the activator and substrate ligands drive nearly 100% of cellular feedback resistant TDA R to the active R state compared to the wild type enzyme that is only 6% present in the active R state. However, in spite of this increased amount of enzyme in the active state, the data in Figure 6B show that AHAS III is able to support only a 5 to 6 fold increase in L-isoleucine production in a feedback resistant E. coli K12 compared to a wild type strain. At the same time, the steady state level of the AHAS III substrate, αKB, is increased about fortyfold ( Figure 6C). This is because E. coli K12 does not have an active AHAS II isozyme that favors the condensation of pyruvate and αKB for L-isoleucine production; thus, αKB would accumulate to toxic levels. These simulation results suggest that in order to over-produce L-isoleucine, the αKB accumulation must be reduced. The results in Figure 6D show that restoring a wild type AHAS II isozyme and simulating an attenuator mutation that elevates the levels of all of the enzymes of the L-isoleucine and L-13 by guest on July 10, 2020 http://www.jbc.org/ Downloaded from valine parallel pathways 11-fold (26), both avoids buildup of αKB and subsequent pathway intermediates ( Figure 6C), and results in a forty-fold increase in L-isoleucine production. These simulated results that show that high level overproduction of L-isoleucine in E. coli requires a functional AHAS II isozyme and a de-attenuated genetic background (ilvGMEDA-att -) agree with experiments performed by Hashiguchi et al. of the Ajinomoto Co., Tokyo, Japan (27).

DISCUSSION
In this report, we describe a mathematical simulation of branched chain amino acid biosynthesis and regulation in E. coli. This approach involves the following steps: 1. The identification of all the molecular participants including enzymes, metabolites, and coenzymes, as well as the enzyme kinetic and regulatory mechanism of each enzyme (defined in Figure 1 and Table 1  2. The development of approximation methods for unavailable model parameters. For example: the approximation of rate constants (k f , k r ) from kinetic measurements (K m, k cat ) described by Yang et al. (10); the approximation of k cat from the activity of purified enzymes; and the approximation of intracellular enzyme concentrations (E T ) from enzyme purification and DNA microarray data.
3. The use of the information obtained in steps one and two to create, as accurately as possible, calculation-independent, models that describe the catalytic and regulatory mechanisms of each enzymatic step (kMech). 14 by guest on July 10, 2020 http://www.jbc.org/ Downloaded from 4. Stringing together appropriate kMech models and providing the physical and kinetic parameters for each enzyme in the pathway. 5. The generation of ordinary differential equations to describe each enzyme mechanism in terms of fundamental molecular interactions (Cellerator). 6. The optimization of model parameters to simulate known steady state intracellular levels of pathway substrates, intermediates, and end-products. set of non-linear differential equations to linear algebra equations (33). Therefore, the kinetic model based on this approach has embedded the steady-state hypothesis. In contrast, the model generated by kMech/Cellerator consists of non-simplified, non-linear differential equations that describe the rates of change of each reactant in the model over time. To build a pathway model, users only need to call upon kMech models for the enzyme mechanisms of a pathway without writing any differential equations.
Because of this simple user input and the integration of kMech, Cellerator, and Mathematica TM , human errors are greatly reduced (10). To allow kMech/Cellerator to be utilized by an audience with little or no programming experience, a java-based graphical user interface (GUI) is under development. This graphical editor is designed to help users to construct pathways, select enzyme mechanisms, and enter required physical and kinetic parameters with simple point-and-click methods.
In contrast to "top down' metabolic flux balance analysis (FBA) methods (34), which provide valuable information about biomass conversions without knowing individual enzyme mechanism and pathway-specific regulation patterns (12), the kMech/Cellerator models described here represent a "bottom up" approach to an understanding of complex metabolic networks. The model presented here is incomplete for many reasons, primarily, because it does not exist in the context of the bacteria cell. In addition to the metabolic regulatory mechanisms considered here, carbon flow through metabolic pathways is affected by a hierarchy of additional controls of gene expression levels that affect pathway enzyme activities and amounts. These hierarchical levels of control, from the most general to the most specific, are: (i) global control of gene activity mediated by chromosome structure (3), (ii) global control of the genes of stimulons and regulons (35), and (iii) operon-specific controls. The first, or highest, level of control is exemplified by DNA topology-dependent mechanisms that coordinate basal level expression of all of the genes of the cell (independent of operon-specific controls). This level is mediated by DNA architectural proteins and the actions of topoisomerases in response to nutritional and environmental growth conditions (3). The second level of control is mediated by site-specific DNA binding proteins, which, in cooperation with operon-specific controls, regulate often overlapping groups of metabolically related operons in response to environmental or metabolic transitions or stress conditions (35). The third level of control is mediated by less abundant regulatory proteins that respond to operon-specific signals and bind in a site-specific manner to one or a few DNA sites to regulate single operons. Each of these levels of control impacts metabolic regulation by influencing enzyme levels. Thus, a complete model of branched chain amino acid biosynthesis in E. coli must include these higher levels of gene regulation. To incorporate these higher levels of regulation we are currently developing a set of models that describe the genetic regulatory mechanisms that control the operons of the ilv regulon. To these ends, we face new challenges. For example, while the ordinary differential equations we are using for metabolic pathways are a deterministic and continuous approximation for an average representation of interactions between large numbers of discrete molecules (e.g. enzymes and metabolites), McAdams and Arkin point out that because each cell contains only one gene/operon there can be large differences in the time between successive events in regulatory cascades across a cell population that can produce probabilistic outcomes (36). To address this and other issues, we are currently working on another software package for genetic regulatory mechanisms, gMech, that implements stochastic simulation algorithms such as Gillespie's algorithm (37), and the Langevin equation (38) that accommodate stochastic noise. This gMech software package will contain models for genetic regulatory mechanisms such as attenuation, activation and repression, as well as DNA topological controls. Therefore, the work presented here should be considered as a first step of a "bottom up" approach that integrates biochemical information from the literature and bioinformatics databases, and relative gene expression data from DNA microarrays to build a self-regulated metabolic pathway in E. coli. As high throughput technologies for genomics, proteomics and metabolomics grow, we expect that a similar approach will soon be feasible in higher organisms.