Global Gene Expression Profiling in Escherichia coli K12

The ArcAB two-component system of Escherichia coli regulates the aerobic/anaerobic expression of genes that encode respiratory proteins whose synthesis is coordinated during aerobic/anaerobic cell growth. A genomic study of E. coli was undertaken to identify other potential targets of oxygen and ArcA regulation. A group of 175 genes generated from this study and our previous study on oxygen regulation (Salmon, K., Hung, S. P., Mekjian, K., Baldi, P., Hatfield, G. W., and Gunsalus, R. P. (2003) J. Biol. Chem. 278, 29837–29855), called our gold standard gene set, have p values <0.00013 and a posterior probability of differential expression value of 0.99. These 175 genes clustered into eight expression patterns and represent genes involved in a large number of cell processes, including small molecule biosynthesis, macromolecular synthesis, and aerobic/anaerobic respiration and fermentation. In addition, 119 of these 175 genes were also identified in our previous study of the fnr allele. A MEME/weight matrix method was used to identify a new putative ArcA-binding site for all genes of the E. coli genome. 16 new sites were identified upstream of genes in our gold standard set. The strict statistical analyses that we have performed on our data allow us to predict that 1139 genes in the E. coli genome are regulated either directly or indirectly by the ArcA protein with a 99% confidence level.

Escherichia coli thrives in the gastrointestinal tract of many warm-blooded animals as a commensal or as a pathogen depending on a strain-dependent complement of genes (2). These enteric bacteria have the ability to switch between aerobic and anaerobic growth if oxygen is limiting. In response to microenvironments in the host, each individual cell adjusts its metabolic pathways to optimize energy generation via aerobic and/or anaerobic respiration or by fermentation of simple sugars (3). Many other cellular functions also are adjusted in response to oxygen availability, such as alterations in gene expression levels of membrane-associated nutrient uptake and/or excretion systems, biosynthetic pathways, and macromolecular synthesis (3).
Expression of E. coli genes involved in oxygen utilization is down-regulated as oxygen is depleted, and in a reciprocal fashion, expression of genes encoding alternative anaerobic electron transport pathways or genes needed for fermentation is switched on. Many of these metabolic transitions are controlled at the transcriptional level by the activities of the ferric nitrate reductase global regulatory protein FNR and/or the two-component ArcAB regulatory system (4,5). The role of the FNR protein in the global control of E. coli gene expression has been profiled in response to anaerobiosis (1). Based on this analysis of whole genome transcription data, it was estimated that the expression of over one-third of the genes expressed during growth under aerobic conditions are altered when E. coli cells transition to an anaerobic growth state and that the expression of half of these genes is modulated either directly or indirectly by FNR. Thus, the fnr gene family was estimated to be ϳ10-fold larger than the 70 members previously recognized as members of the fnr gene regulatory network (6,7).
The ArcAB (aerobic respiratory control) two-component regulatory system is recognized as a second global regulator of anaerobic gene regulation (3,6,8). The ArcAB system is composed of a classical OmpR-like receiver regulator, ArcA, and a membrane-associated sensor transmitter protein, ArcB (6). Together, these components have been shown to regulate expression of oxygen-requiring pathways, including the tricarboxylic acid cycle (e.g. sdhCDAB, icd, fumA, mdh, gltA, acnA, and acnB), and the aerobic cytochrome oxidase complexes (9 -18). ArcAB is also known to be required for proper expression of certain catabolic genes for pyruvate utilization and sugar fermentation (19 -21).
In this genome-based study, we have identified additional E. coli genes under oxygen control that are differentially expressed in response to the ArcA global regulatory protein. This was accomplished by the use of DNA microarrays to analyze gene expression profiles in E. coli cells cultured at steady-state growth rates under aerobic (ϩO 2 ) or anaerobic (ϪO 2 ) growth conditions and in cells cultured under anaerobic growth conditions in the presence (ϪO 2 , ϩArcA) or absence (ϪO 2 , ϪArcA) of the ArcA protein or in otherwise arcA ϩ and arcA Ϫ isogenic strains. These experiments show that about one-half of the genes whose expression levels are affected by aerobic to anaerobic transitions are also affected by the ArcA protein. Thus, the number of E. coli genes differentially regulated by the ArcA protein is much larger than the 30 (5) or 100 (22) genes/operons previously recognized. The results of the gene expression profiling experiments further show that as many as two-thirds of the genes whose expression levels are affected by the ArcA protein are also affected by the FNR protein (1).

MATERIALS AND METHODS
Chemicals and Reagents-Avian myeloblastosis virus reverse transcriptase and Sephadex G-25 Quickspin columns were obtained from Roche Applied Science. Phenol and the DNA-free kit were purchased from Ambion Inc. Ribonuclease inhibitor III was purchased from Pan-Vera/Takara. Ultrapure deoxynucleoside triphosphates were purchased from Amersham Biosciences. Random hexamer oligonucleotides and T4 polynucleotide kinase were obtained from New England Biolabs Inc., and [␣-33 P]dCTP (2-3000 Ci/mmol) was obtained from PerkinElmer Life Sciences. DNA filter arrays (Panorama E. coli gene arrays) were obtained from Sigma. SYBR Gold was purchased from Molecular Probes, Inc.. All other chemicals were obtained from Sigma. All reagents and baked glassware used in RNA manipulations were treated with diethyl pyrocarbonate prior to their use.
Total RNA Isolation, cDNA Synthesis, and Target Labeling Conditions-Total RNA was isolated from 10-ml cultures; cDNA was synthesized and labeled with [␣-33 P]dCTP; and filters were hybridized exactly as described by Hung et al. (25). Stripping and reusing filters four times as described here results in a Ͻ3% increase in variance (26).
Data Acquisition-The commercial software package DNA Array-Vision obtained from Research Imaging Inc. was used to grid the 16-bit image file obtained from a PhosphorImager, to record the pixel density of each of the 18,432 addresses on each filter, and to perform the background subtractions. 8580 of the addresses on each filter were spotted with duplicate copies of each of the 4290 E. coli open reading frames (ORFs). The remaining 9852 empty addresses were used for background measurements. Because the backgrounds were constant, a global average background measurement was subtracted from each experimental measurement, although local background calculations are possible.
Experimental Design-The experiments described here (Fig. 1) were performed at the same time as our previously reported experiments profiling gene expression levels in the presence or absence of oxygen and FNR (1). The data for strain MC4100 (ArcA ϩ ) grown aerobically (Experiment 1, Filters 1 and 2) and anaerobically (Experiment 2, Filters 3 and 4) have been reported by Salmon et al. (1). For Experiment 3, Filters 5 and 6 were hybridized with random hexamer-generated 33 Plabeled cDNA fragments complementary to each of three independently prepared RNA preparations (RNA 25-27) obtained from three individual cultures of strain PC35 (arcA Ϫ ) grown under anaerobic conditions. These three 33 P-labeled cDNA target preparations were pooled prior to hybridization to the full-length ORF probes on the filters (Experiment 3, Replicate 1, Filters 5 and 6). Following PhosphorImager analysis, these filters were stripped and again hybridized with pooled 33 P-labeled cDNA target fragments complementary to each of another three independently prepared RNA preparations (RNA 28 -30) from the same strain (PC35; Experiment 3, Replicate 2). This procedure was repeated one more time with Filters 5 and 6 with yet another independently prepared pool of cDNA targets (Experiment 3, Replicates 3; RNA [31][32][33]. The data for the fourth replicate of this experiment were lost.
This experimental design produced duplicate filter data for four replicates performed with cDNA targets complementary to four independent sets of pooled RNA preparations for Experiments 1 and 2. Thus, because each filter contained duplicate spots for each ORF and duplicate filters were used for each experiment, a total of 16 measurements were obtained, four measurements for each ORF from each of four replicates. Duplicate filter data were obtained for three replicates performed with cDNA targets complementary to three independent sets of pooled RNA preparations for Experiment 3. Thus, because each filter contained duplicate spots for each ORF and duplicate filters were used for each experiment, a total of 12 measurements were obtained, four measurements for each ORF from each of three replicates.
Statistical Analyses-Data processing and statistical methods implemented in the Cyber-T software used for the analysis and interpretation of the data obtained from the DNA microarray experiments described in this study were the same as those described previously by Salmon et al. (1). For each target signal, a background subtracted estimate of the expression level was obtained and scaled to total counts on the membrane by dividing each individual gene expression value by the total of all target signals on the membrane. Thus, each normalized gene level is expressed as a fraction of the total mRNA hybridized to each DNA array. For any given measurement, a value greater than zero (indicating an expression level) or a zero (indicating an expression level lower than background) was obtained. Only those genes exhibiting an expression level greater than zero in all replicates were used for statistical analysis. These gene expression level measurements were analyzed by a regularized t test based on a Bayesian statistical framework (25)(26)(27)(28)(29). For analysis of the data reported here, we ranked the mean gene expression levels of the replicate experiments in ascending order, used a sliding window of 101 genes, and assigned the average S.D. of the 50 genes ranked below and above each gene as the Bayesian S.D. for that gene. The p values for each gene measurement based on a regularized t test with a confidence value of 10 are reported in the Supplemental Material. A comprehensive discussion of the use of a regularized t test and the modifications applicable to the analysis of DNA microarray data of the type presented here is described in detail elsewhere (26).
Gene measurements containing zero expression values in one or more replicates were set aside. Among this set of genes, those with zero expression values for all replicates in one experiment and all values greater than zero for all measurements of another experiment were identified. Because these gene measurements could not be analyzed with a t test, the significance of these results was evaluated by ranking these genes in ascending order according to their coefficients of variance of the four greater than zero measurements of each experiment.
Cyber-T employs a mixture model-based method described by Allison et al. (30) for the computation of the global false positive and false negative levels inherent in a DNA microarray experiment (25,26). With this method, described by Hung et al. (25), we estimated the rates of false positives and false negatives as well as true positives and true negatives at any given p value threshold. In other words, we obtained a posterior probability of differential expression PPDE(p) value for each gene measurement and a PPDE(Ͻp) value at any given p value threshold based on the experiment-wide global false positive level and the p value exhibited by that gene (25,26). In most instances, PPDE (Ͻp) values are reported  below and Tables I-VIII. However, both PPDE(p) and PPDE(Ͻp) values are given for each gene in the Supplemental Material.
It is expected that for each p value threshold, there is a tradeoff between the rates of true and false positives. A low conservative p value threshold leads to few false positives, but may also reduce the true positive rate. A large p value threshold ultimately allows one to recover all the true positives, but at the cost of increasing the false positive rate. This fundamental tradeoff is usually captured in statistics using a receiver operating characteristic curve obtained by plotting the true positive rate (or sensitivity) defined by true positive/(true positive ϩ false negative) versus the false positive rate defined by false positive/ (false positive ϩ true positive) (87). For instance, at a 77% true positive rate, we expect a 5% false positive rate when Experiment 1 (ϩO 2 , ϩArcA) is compared with Experiment 2 (ϪO 2 , ϩArcA) ( Fig. 2A), and at a 80% true positive rate, we expect a 5% false positive rate when Experiment 2 (ϪO 2 , ϩArcA) is compared with Experiment 3 (ϪO 2 , ϪArcA) (Fig. 2B).
The Cyber-T software package is available at the web site for the Institute for Genomics and Bioinformatics at the University of California (Irvine, CA). The clustering methods used to determine the regulatory patterns reported below are those implemented in the Gene-Spring TM software package (Silicon Genetics, Redwood City, CA).
Data Accession-All raw and processed data for the experimental results reported here are provided in tabular format as Excel files in the Supplemental Material.

Differential Gene Expression in the Presence or Absence of Oxygen
In the following discussions, we often refer to the -fold change for differentially expressed genes. However, simple -fold changes are incomplete and can be misleading (26).  Tables I-IX, we report only p values, PPDE (Ͻp) values, and -fold changes.
A comparison of the wild-type E. coli gene expression levels between cells grown in the presence and absence of oxygen revealed 2820 genes that exhibited expression levels above the background for all replicates of Experiments 1 and 2 (ϩO 2 , ϩ ArcA versus ϪO 2 , ϩ ArcA) ( Fig. 1) (1). The statistical analysis of these data revealed that approximately one-half of the genes expressed during aerobic growth (1445 genes) were differentially expressed following a transition from aerobic to anaerobic growth with a p value of 0.05 and a PPDE(Ͻp) value of 0.96. Therefore, 58 of these 1445 differentially expressed genes are expected to be false positives (25).

Differential Gene Expression in the Absence of Oxygen and in the Presence and Absence of the ArcA Global Regulatory Protein
A comparison of the gene expression levels between cells grown in the absence of oxygen and in the presence or absence of ArcA revealed 2264 genes that exhibited expression levels above the background for all replicates of Experiments 2 and 3 (ϪO 2 , ϩArcA versus ϪO 2 , ϪArcA) (Fig. 1). Again, about onehalf of the gene expression levels were modulated by this treatment condition. An examination of the distribution of p values suggested that the expression levels of 1243 genes with p values Ͻ0.05 were modulated either directly or indirectly by ArcA during growth transition from aerobic to anaerobic con-

Identification of Differential Gene Expression Patterns Resulting from Two-variable Perturbation Experiments
To identify the global changes and adjustments of gene expression patterns that facilitate a transition from aerobic to anaerobic growth conditions and to determine the effects of genotype on these gene expression patterns, we analyzed E. coli gene expression profiles obtained from cells cultured under aerobic (ϩO 2 ) or anaerobic (ϪO 2 ) growth conditions and under anaerobic growth conditions in the presence (ϪO 2 , ϩArcA) or absence (ϪO 2 , ϪArcA) of ArcA, the global regulatory protein for anaerobic metabolism. Because ArcA is presumed to be inactive under aerobic conditions (5, 6, 31), we did not perform experiments comparing arcA genotypes under aerobic conditions. Only two general regulatory patterns can be observed when two experimental conditions are compared, e.g. growth in the presence or absence of oxygen. However, when two conditions are compared, at least eight general regulatory patterns are expected. The data in Fig. 3 diagram the eight basic regulatory patterns that could be observed among three experiments conducted in the presence and absence of oxygen in an arcA ϩ strain and in the absence of oxygen in an arcA Ϫ strain. For simplicity, only three expression levels for each of these three experimental conditions were assumed: low, medium, and high.
To identify genes differentially expressed at a high confidence level that correspond to each of the patterns (I-VIII) diagrammed in Fig. 3, the genes differentially expressed due to the treatment conditions of Experiments 1 and 2 were sorted in ascending order according to their p values based on the regularized t test as described under "Materials and Methods." Next, the genes differentially expressed due to the treatment conditions of Experiments 2 and 3 were sorted in ascending order according to their p values. 100 genes with the lowest p values present in both lists were selected. These genes exhibited either an increased or decreased expression level between both treatment conditions (i.e. between Experiments 1 and 2 and between Experiments 2 and 3) (Fig. 3).
To identify those genes differentially expressed at a high level of confidence under the treatment conditions of Experiments 1 and 2 but expressed at the same or similar levels under the treatment conditions of Experiments 2 and 3 (patterns III and IV) (Fig. 3), the 500 genes of Experiments 1 and 2 with the highest probability for differential expression values were compared with the 500 genes of Experiments 2 and 3 with the lowest probability for differential expression values. This comparison identified 40 genes that were present in both lists, i.e. genes whose regulatory patterns fulfill this criterion. Likewise, to identify those genes differentially expressed under the treatment conditions of Experiments 2 and 3 but expressed at the same or similar levels under the treatment conditions of Experiments 1 and 2 (patterns VI and VIII) (Fig. 3), the 500 genes of Experiments 2 and 3 with the highest probability for differential expression values were compared with the 500 genes of Experiments 1 and 2 with the lowest probability for differential expression values. This comparison identified 35 genes that were present in both lists. These gene lists were combined into a single list of 175 genes differentially expressed under at least one treatment condition. All of the differentially genes of this list exhibited p values Ͻ0.00013 and a global confidence based on the experiment-wide false positive level of 99% (PPDE(Ͻp) ϭ 0.99). They constitute the "gold standard" gene set for the following analyses.

Hierarchical Clustering and Principal Component Analysis
GeneSpring TM software was used to empirically determine parameters for hierarchical clustering of these 175 genes into the eight patterns of Fig. 3 as discussed by Salmon et al. (1) and shown in Fig. 4. Interestingly, 83 of these ArcA-regulated genes are also differentially regulated directly or indirectly by FNR (patterns I, II, and V-VIII) (1). As an independent test to corroborate the accuracy of this supervised hierarchical clustering method, we used principal component analysis to cluster and visualize the same set of 175 genes (14). The principal component analysis clustering results shown in Fig. 5 illustrate that this unsupervised method produced the same results as the supervised hierarchical clustering method.

Interpretation of Clustering Results
Although some of the genes or operons differentially expressed in the presence or absence of ArcA are expected to be affected only indirectly, others whose expression is directly regulated by ArcA should possess a DNA-binding site(s) upstream of their transcriptional start site(s). ArcA is a 28-kDa protein that contains a winged helix-turn-helix motif that interacts with a poorly conserved consensus DNA sequence (31). This ArcA-P consensus sequence, obtained from DNA footprinting experiments performed with ϳ15 ArcA-regulated promoters, is 5Ј-WGTTAATTAW-3Ј (where W is A or T) (31).
Liu and De Wulf (22) used a weight matrix and a subset of 10 ArcA-footprinted promoter regions to define a slightly different consensus sequence of 5Ј-GTTAATTAAATGTTA-3Ј. This sequence resembles the previous 10-bp consensus sequence; how- ever, it is extended by 5 residues at the 3Ј-end, and the first nucleotide of the original consensus sequence (5Ј-(A/T)) turned out to be poorly conserved and is not included in their motif (22). For the analyses reported here, a set of 26 known ArcAbinding sites in E. coli, including the 15 sites reviewed by Lynch and Lin (31) plus three newly footprinted ArcA-binding sequences, 2 was compiled (see the Supplemental Material). Analysis of these sequences with MEME Version 3.0 (32, 33) identified a partially degenerate 15-bp motif. A weight matrix was generated from the motif found by MEME. The E. coli K12 genome was then scanned for sequences on either strand that had a weight matrix score exceeding the threshold and that were within 300 bp of an ORF origin, as identified by Regu-lon_DB (34). A total of 386 such sequences were located.
When ArcA acts as an activator of gene expression, it most often binds to upstream sites centered from 60 to 120 base pairs before the transcriptional start site of the affected gene or operon. When it acts as a repressor of gene expression, it binds to other sites often located near the transcriptional start site of the affected gene or operon (31). Of the 42 genes down-regulated in the presence of ArcA (patterns I, V, and VI) (Fig. 3), 12 contain a documented ArcA-binding site or a predicted ArcAbinding site at or near the transcriptional start site using the above MEME/weight matrix method (Tables I, V, and VI). Of the 93 genes up-regulated in the presence of ArcA (patterns II, VII, and VIII) (Figs. 4 and 5), 14 contain an upstream documented or predicted ArcA-binding site (Tables II, VII, and VII). Because the expression levels of the 40 genes of patterns III and IV were not affected by the presence or absence of ArcA, they are not expected to possess binding sites for this regulatory protein. However, five of these genes are predicted to possess a putative ArcA-binding site (Tables III and IV). Of these, three genes, cydA, nuoG, and nuoF, are known to be ArcA-regulated; however, the expression data are not consistent with previously published data, and this is likely due to paralog issues within the E. coli genome. Thus, the statistical and clustering methods described here produced results consistent with biological expectations.

Functional Classes of Genes Affected by Oxygen Availability and ArcA
The following discussion is limited to the 175 genes (our gold standard set) of regulatory patterns I-VIII ( Fig. 4), although ArcA control of many other genes may be deduced from the information supplied in the Supplemental Material. As in our previous study (1), they represent many genes known to be oxygen-controlled and another larger set for which no previous information is available. These genes are listed in Tables I-VIII and represent genes involved in a large number of processes, including small molecule biosynthesis, macromolecular synthesis, and aerobic/anaerobic respiration and fermentation. Regardless of their metabolic role, these genes are discussed below in the context of their expression patterns (Fig. 3).
Expression Pattern I: Decreased Expression during Anaerobiosis and Increased Expression in an ArcA Strain-Among the 175 genes displayed in the clustering procedures described above, 37 showed decreased expression under anaerobic conditions due to regulation by ArcA (Table I). Of these 37 genes, 10 have been reported to be directly regulated by ArcA (6), and 27 are newly discovered genes that are regulated either directly or indirectly by this global regulatory protein. In addition, 23 of the genes clustered into pattern I were also identified as being down-regulated by the FNR protein in our previous study (1). Previously described ArcA-regulated genes will be discussed first, followed by a discussion of the newly discovered ArcAdown-regulated genes.
Seven genes of the tricarboxylic acid cycle clustered into pattern I: icdA, sdhAB, lpdA, mdh, sucD, and gltA. Each of these seven genes has been shown previously to be anaerobically repressed by the ArcA protein (5,6,9,10,12,13,31,35,36). Regulation of lpdA by FNR was also observed in our previous study (1). A search for putative ArcA-binding sites using our customized MEME/weight matrix method (see "Materials and Methods") identified one or more sites upstream of each of these genes ( Table I).
The cyoA gene is the first member of the cyoABCDE operon, which encodes all of the subunits of the cytochrome o ubiquinol oxidase. The cyoA gene was expressed 10-fold higher when cells were grown aerobically and 23-fold higher when cells were grown anaerobically in the ArcA-deficient strain (Table I). A previous study by our laboratory using a cyoA::lacZ fusion in the same ArcA ϩ and ArcA Ϫ isogenic strains used in this work showed the same regulatory pattern (16). A site similar to the ArcA consensus sequence has been identified upstream of the cyoA promoter 3 and was also shown to be subject to regulation 2 R. P. Gunsalus, unpublished data. 3  by FNR in our previous study (1), but this is likely indirect. Our MEME/weight matrix identified four putative ArcA-binding sites (Table I) upstream of the cyoA gene.
The nuoB and nuoE genes, which belong to the nuoA-N operon, encode NdhI (NADH dehydrogenase I), a membraneassociated, multisubunit, proton-translocating enzyme similar to complex I of eukaryotic mitochondria (37). Expression of both of these genes was lower under anaerobic conditions and elevated in the arcA mutant (Table I). A previous study using nuo::lacZ fusions established that nuo expression is subject to ArcA-mediated anaerobic repression (38). Two putative ArcAbinding sites were identified ϳ140 and 190 bp upstream of the nuoA gene using our MEME/weight matrix method (Table I).
The nuoE gene also appeared to be subject to FNR regulation in our previous work (1), but the effect of FNR may be indirect as a consequence of its role in regulating ArcA expression (39).
The remaining genes in this group have not been shown previously to be subject to ArcA regulation. These newly discovered genes fall into the same functional classes as the genes regulated by the leucine-responsive regulatory protein Lrp under aerobic conditions (25) and FNR under anaerobic conditions (1). These functional classes include genes for small molecule biosynthesis and transport and macromolecule biosynthesis. More interestingly, of the remaining 27 genes of this expression group, 20 were also found to be regulated by FNR under anaerobic conditions (1).
11 of the remaining genes of this expression group belong to the macromolecule synthesis class. Eight of these were also observed to be regulated by FNR (1). These are rpsA, rpsT, rpsJ, rplS, rplT, and rplM (ribosomal proteins); tufA (elongation factor Tu); and oppA (oligopeptide permease). The remaining three genes are rplX (ribosomal protein), pal (essential lipoprotein), and atpG (ATP synthase). Putative ArcA-binding sites were identified using the MEME/weight matrix for two of these: oppA and atpG.
The functions of the remaining four genes in this list, ycdC, yajG, yceD, and yfiA, remain to be characterized. Three of these four genes, yajG, yceD, and yfiA, were also observed to be regulated by FNR in anaerobiosis (1).
Recently, Liu and De Wulf (22) identified 234 ORFs as being repressed by ArcA under anaerobic conditions in a microarraybased study. In our gold standard set, we identified a total of 42 genes as being up-regulated in an arcA mutant (patterns I, V, and VI) or 37 genes in pattern I. Only three genes, gltA, icd, and mdh, are conserved between the two reported data sets. However, our clustering set of 175 genes is highly restricted, with a strict PPDE(Ͻp) cutoff level of 0.997, and eliminates false positives and other genes for which the data are of lower statistical significance.
Expression Pattern II: Increased Expression during Anaerobiosis and Decreased Expression in an ArcA Strain-Transcription of the 57 genes of expression pattern II (Table II) was both induced in the absence of oxygen and positively regulated by ArcA. Moreover, of these 57 genes, 34 were also observed to be positively regulated by FNR in anaerobiosis (1). 19 of these genes are members of the small molecule metabolism and transport group. Among the genes for metabolism, eight were also observed to be positively regulated by FNR in anaerobiosis. These are pyrD (dihydro-orotate dehydrogenase), glnD (uridylyltransferase), mobB (molybdenum cofactor biosynthesis), speC (ornithine decarboxylase), narY (cryptic nitrate reductase subunit), glnE (glutamine synthetase/adenylyltransferase), tdh (threonine dehydrogenase), and tynA (tyramine oxidase). One of these genes, glnD, is predicted to have a putative ArcAbinding site (Table II).
The gadA and gadB genes, encoding two highly homologous glutamate decarboxylases, also clustered into this group. In agreement with our previous study (1), lacZ fusion studies have shown that their anaerobic induction is due solely to the presence of the arcA gene product, 2 but only the gadA gene has a predicted ArcA-binding sites upstream of its start codon. The gadX and gadW genes also clustered into pattern I. These two genes encode transcription factors that control the expression of the gadA and gadBC operons (40 -43). A putative ArcAbinding site(s) was identified upstream of each of these two genes (Table II). Two other genes, rhaA (L-rhamnose isomerase) and glgC (glucose-1-phosphate adenylyltransferase), have not been shown previously to be regulated by ArcA.
Six genes of this expression pattern belong to the small molecule transport functional class. Four of these genes were shown previously to be subject to FNR-mediated regulation (1). These genes are yabM (setA, glucose/lactose efflux transporter), yadQ (clcA, mammalian chloride channel protein homolog), nanT (sialic acid transporter), and uraA (transport of uracil). The remaining two genes belonging to this group are nfrA (an outer membrane protein) and pnuC (nicotinamide mononucleotide transporter).
As in our previous study (1), several genes of this expression pattern belonging to the macromolecular synthesis class are for DNA repair: recB and recC (subunits of the RecBCD enzyme complex), dinG (encodes a LexA-regulated DNA repair enzyme), and sbcC (co-suppressor of recBC mutations). Of the remaining five genes belonging to this functional group, only one was also observed to be regulated by FNR: glgA (glycogen synthesis). The other four genes are degQ (hhoA, periplasmic serine endopeptidase); cdh (CDP-diglyceride hydrolase); and two hydrogenase-encoding genes, hycD (hydrogenase-3 subunit) and hyaB (hydrogenase-1 subunit). Putative ArcA-binding sites were identified upstream of recB and hycD (Table II).
Of the remaining genes clustered into this expression pattern, two genes, mrcA (penicillin-binding protein 1A) and rarD (involved in chloramphenicol resistance), were also observed to be regulated by FNR (1). Two other genes, organized in an operon encoding a putative alternative cytochrome oxidase, appCB (cbdAB), were not observed previously to be regulated by FNR (1), and xylR (regulatory gene for the xylose operon) also clustered into this expression pattern. The 23 remaining members of this expression pattern are currently uncharacterized, 12 of which were also previously observed to be regulated by FNR (1). A putative ArcA-binding site was identified upstream of rarD and xylR and upstream of 2 of the 23 previously uncharacterized genes (ydbA and yhjE).
Only one gene in this expression pattern, glcC, was also identified in the study by Liu and De Wulf (22); however, their results indicated that glcG is repressed by ArcA (2.6-fold). Liu and De Wulf identified a total of 138 genes as being activated in the presence of ArcA. Again, in our gold standard set, we identified a total of 42 genes as being up-regulated in an arcA mutant (patterns II, VII, and VIII) or 57 genes in pattern II. However, our clustering set of 175 genes is highly restricted, with a strict PPDE(Ͻp) cutoff level of 0.997.
Expression Pattern III: Decreased Expression during Anaerobiosis and No Change in an ArcA Strain-34 genes clustered into expression pattern III. Of these, 23 clustered into the same expression pattern in our previous study (1), indicating that a Distance is upstream from the start codon of the gene or the first gene of the operon (if internal). Sites are predicted from the gene promoter. Other putative ArcA-binding sites may also be predicted upstream of a secondary promoter, but are not mentioned here. the expression of these genes, although decreased during anaerobiosis, is not regulated by either ArcA or FNR.
Two members of the nuoA-N operon, nuoG and nuoE, which encode NdhI, a membrane-associated, multisubunit, protontranslocating enzyme similar to complex I of eukaryotic mitochondria (37), clustered into pattern I. Expression of the nuoE gene (Table III) Table IX). A previous study using nuo-lacZ fusions established that nuo expression is subject to ArcA-mediated anaerobic repression and NarL nitrate-mediated anaerobic activation (38). Two other members of this operon clustered into pattern I (nuoB and nuoE) ( Table I).
The cydA gene (part of the cydAB operon) encodes the high affinity terminal oxidase of the oxygen respiratory chain, cytochrome d oxidase. The data obtained here show that cydA was repressed ϳ2-fold during anaerobic growth, but was unchanged in the ArcA-deficient strain (Table III). In agreement with these findings, previous studies using cydA::lacZ fusions showed that transcription of the cydAB operon is ArcA-repressed when oxygen becomes limiting (16,44,45). Subsequent studies have shown that ArcA functions to anti-repress cydAB transcription when oxygen is limiting (46), whereas FNR is required for repression when the oxygen tension is decreased further (14, 17, 45). As our study was carried out in full anaerobiosis, the ArcA effect was not observed, but the FNR effect a Distance is upstream from the start codon of the gene or the first gene of the operon (if internal). Sites are predicted from the gene promoter. Other putative ArcA-binding sites may also be predicted upstream of a secondary promoter, but are not mentioned here.
was observed in our previous study (1). There are three ArcA sites that have been footprinted (17,31). The study by Liu and De Wulf (22) also identified cydA to be ArcA-controlled; however, their study indicated that it is ArcA-activated (5.2-fold).
The remaining 31 genes of this cluster have not been studied previously for their expression under anaerobic growth conditions; however, one contains a putative ArcA-binding site (ykgI) (Table III). Again, the genes of this cluster are members of the same functional classes of expression patterns I and II. Three genes (fabG, rfbX, and katE) are involved in small molecule metabolism. 17 genes (rplB, rplC, rplO, hflC, rplF, rplQ, rplI,  rpsE, rho, prfB, rplD, rpsH, tsf, rplE, nfi, tig, and lysS) are involved in macromolecule synthesis or degradation. 10 genes of this cluster are of unclassified function, seven of which were also identified in our FNR study (1). The remaining gene, eaeH (homologous to attachment and effacement proteins), also clustered into this expression pattern.
Expression Pattern IV: Increased Expression during Anaerobiosis and No Change in an ArcA Strain-The six genes of this cluster (Table IV) showed elevated expression under anaerobic a Distance is upstream from the start codon of the gene or the first gene of the operon (if internal). Sites are predicted from the gene promoter. Other putative ArcA-binding sites may also be predicted upstream of a secondary promoter, but are not mentioned here. a Distance is upstream from the start codon of the gene or the first gene of the operon (if internal). Sites are predicted from the gene promoter. Other putative ArcA-binding sites may also be predicted upstream of a secondary promoter, but are not mentioned here. a Distance is upstream from the start codon of the gene or the first gene of the operon (if internal) Sites are predicted from the gene promoter. Other putative ArcA-binding sites may also be predicted upstream of a secondary promoter, but are not mentioned here. growth conditions, but were not affected by deletion of the arcA allele. Two genes of unknown function clustered into this group (ybeD and ygjD) and also clustered into the same group in our FNR study (1). The remaining members of this cluster include htpG (a heat shock protein), mrr (involved in the restriction of methylated adenine residues; also clustered into this group in Ref. 1), cysK (cysteine synthase), and cof (complementation of fur). A search of the promoter regions of these six genes identified a putative ArcA-binding site upstream of one of these genes: ybeD. None of these genes were identified by Liu and De Wulf (22).

Expression Pattern V: Increased Expression during Anaerobiosis and Increased Expression in an ArcA
Strain-This cluster contains only a single gene of unknown function: ybjX (Table V). A similar pattern of expression was also observed previously (1).
Expression Pattern VI: No Change during Anaerobiosis and Increased Expression in an ArcA Strain-Of the four genes of this cluster, three are involved in small molecule metabolism and transport: gapA (structural gene for glyceraldehyde-3phosphate dehydrogenase A, essential for glycolysis), potF (member of the potFGHI operon involved in the transport of putrescine), and hisJ (member of the hisTJQMP operon encoding a histidine-binding protein that is part of the periplasmic permeases for the high affinity uptake of histidine). The final member, ydcF, is currently uncharacterized. All four members of this expression pattern clustered into the same group in our FNR study (1).
Expression Pattern VII: Decreased Expression during Anaerobiosis and Decreased Expression in an ArcA Strain-The same five genes observed in this expression pattern were also observed in our study with FNR (1). Two of the genes, frdA and nirB, have been shown previously to be FNR-regulated (47)(48)(49). As we discussed previously (1), the discrepancy in these data is likely due to paralogs in the genome with these two genes (sdhA to frdA and nirD, cysI and cysJ to nirB). The remaining genes include rpmC (ribosomal protein) and two uncharacterized genes, ybdE (cusB) and ylcD (cusA).
Expression Pattern VIII: No Change during Anaerobiosis and Decreased Expression in an ArcA Strain-This cluster contains 31 genes, 20 of which are of unknown function. (12 were also identified in our previous FNR study (1).) Of the 31 genes of known function (Table VIII), two are known to be regulated by oxygen and/or ArcA under anaerobic growth conditions, and four contain putative ArcA-binding sites.
The two genes reported to be regulated by oxygen and/or ArcA are fumB and lysU. The anaerobic fumarase, encoded by fumB, is known to be activated during anaerobic fermentative growth (50,51), and Tseng (51) showed that both ArcA and FNR are responsible for this anaerobic activation. As stated in our previous work (1), although our microarray data indicate that fumB is not regulated with respect to oxygen, its presence in this expression pattern is probably a result of the high sequence identity (80%) between fumB and the aerobically expressed fumarase, fumA.
The lysU gene encodes one of the two lysyl-tRNA synthetases (the other being lysS, with which it shares 79% sequence identity (52)) and was reported previously to be induced under anaerobic conditions (53).
The functional class distribution of the 175 genes of regulatory patterns I-VIII is shown in Fig. 6. Roughly 37.7% are hypothetical or unclassified, whereas another 23.4% are involved in small molecule metabolism. Most of the previously documented oxygen-controlled genes fall into the category of carbon and energy metabolism (5%). The study by Liu and De Wulf (22) identified 58 new genes/operons that are implicated in energy metabolism, transport, survival, catabolism, and transcriptional regulation.

Genes Not Expressed in at Least One Experiment
Only those genes exhibiting an expression level greater than zero in all experiments were used for statistical analysis. To identify differentially expressed genes that were not expressed under one condition but turned on under another treatment condition (or vice versa), gene measurements containing zero expression values were set aside and are listed in Table IX. This set contains only eight genes with expression values of at a Distance is upstream from the start codon of the gene or the first gene of the operon (if internal). Sites are predicted from the gene promoter. Other putative ArcA-binding sites may also be predicted upstream of a secondary promoter, but are not mentioned here. a Distance is upstream from the start codon of the gene or the first gene of the operon (if internal). Sites are predicted from the gene promoter. Other putative ArcA-binding sites may also be predicted upstream of a secondary promoter, but are not mentioned here. least 1 ϫ 10 Ϫ4 of total mRNA for all measurements in at least one experiment with a coefficient of variance Ͻ0.2 (Table IX). Seven are members of pattern II (increased expression during anaerobiosis and decreased expression in an ArcA strain): yddS, ftsW, hyaD, ldcC, ybdA, yhgE, and yrbF. The remaining gene, frdB, is a member of pattern VII (decreased expression during anaerobiosis and decreased expression in a ArcA strain). Two of these genes contain putative ArcA-binding sites: yddS and yhgE (Table IX).

Venn Diagram
To better visualize the interaction between the oxygen, ArcA and FNR regulons, Venn diagrams were created (Fig. 7). The top 500 genes (sorted by p value) from each data set were used as in the construction of the 175-gene list described above and the 205-gene list from our previous study (1). Interestingly, 303 genes were found to be regulated by both ArcA and FNR, and 74 of these genes showed additional regulation by oxygen (Fig.   7A). This is in contrast to the 16 genes reported previously to be co-regulated (5,6).
In looking at the top 500 genes from each group, 48 genes were identified as being subject solely to ArcA regulation and 57 solely to FNR regulation under anaerobic conditions. The remaining 321 genes pose an interesting question as to whether or not another global oxygen regulator that has yet to be identified exists within the E. coli genome. Moreover, the 378 genes in the ArcA grouping and the 369 genes in the FNR grouping that do not show regulation by oxygen, but that are regulated by each of these proteins (or co-regulated) under anaerobic conditions, suggest that these two proteins may also be important for adaptation to the anaerobic environment. It is also important to note that a large proportion of the 515 genes in this latter group are currently of unknown function. In addition to the comparisons above, a second comparison between the ArcA, FNR, and Lrp (25) regulons was also done (Fig.  7B), as we had indicated previously an overlap between the FNR and Lrp data sets (1). This diagram reveals 48 genes overlapping between the Lrp and FNR regulons, 43 genes overlapping between the ArcA and Lrp regulons, and 26 genes overlapping between all three (data not shown). These comparisons strongly suggest that regulatory networks are more complex than described previously.

Comparison with Other Studies
When different array formats are used, the magnitudes and sources of experimental errors are surely different. This raises the question of whether or not results obtained from experiments performed with different DNA array formats such as pre-synthesized filter arrays and in situ synthesized Affymetrix GeneChips can be compared with one another. We have previously addressed this question. Hung et al. (25) compared the results of 4-fold replicated gene expression profiles of a Distance is upstream from the start codon of the gene or the first gene of the operon (if internal). Sites are predicted from the gene promoter. Other putative ArcA-binding sites may also be predicted upstream of a secondary promoter, but are not mentioned here. otherwise wild-type and lrp isogenic E. coli strains performed with these two DNA microarray formats. To emphasize variance due to format differences, the same RNA samples were used for target preparation for both formats, and the data were analyzed with Cyber-T software as described here. When the top 100 genes with the lowest p values obtained with each format were compared, a highly significant number of genes, 29, were in common.
Liu and De Wulf (22) have reported the transcriptional profiles of arcA ϩ and arcA Ϫ E. coli cells grown under anaerobic conditions and generously provided us with their raw data. A comparison of this Affymetrix GeneChip data with our filter array data, both analyzed with Cyber-T software, does not show significant agreement. Of the top 100 genes with the lowest p values (Ͻ0.018) obtained with each format, only three genes were in common. Because Liu and De Wulf use a different data analysis software package (Spotfire) and defined differentially expressed genes as those with an expression level coefficient of variance Ͻ0.8 and a mutant to wild-type signal ratio of Ͼ2 with p Ͻ 0.05, it is not possible to directly compare their results with the results presented here. In addition, Liu and De Wulf also used a different carbon source (xylose rather than glucose). We can, however, compare conclusions. They reported 58 differentially expressed genes of operons under the direct control of ArcA as evidenced by the presence of a documented or putative DNA-binding site. In our data set, these genes exhibit p values ranging from 3.8 ϫ 10 Ϫ6 to 0.9 and PPDE(p) values ranging from 1.0 to 3.2 ϫ 10 Ϫ8 . This suggests many false negatives and false positives in the data set of Liu and De Wulf.

Implications for Genome-wide Control by ArcA and FNR
In this study, we employed statistical methods (1,25) for the identification of differentially expressed genes based on experiment-wide false positive and false negative measurement levels. These methods previously allowed us to infer differential expression for more than one-third of the 4290 genes of E. coli during growth in the presence or absence of oxygen (1445 genes) (1). This study has allowed us to determine that ϳ1243 of these changes in expression level are mediated either directly or indirectly by ArcA (Fig. 2B). These results further support our previous conclusions (1) that the network of genes required for the transition of cells from aerobic to anaerobic growth conditions is as much as 10 times larger than previously suspected. A comparison of the ArcA and FNR gold standard sets showed that 303 genes were regulated by both proteins (Fig. 7A), 74 of which were also affected by oxygen. Previous to this study, only 16 genes had been reported to be co-regulated (5,6). Therefore, as suggested previously by us (1) and Liu and De Wulf (22), the total number of genes directly activated or repressed by ArcA and FNR is likely to be much higher than documented previously.

Rationale of Regulatory Patterns
Regulatory pattern I (anaerobic repressed gene expression, i.e. decreased expression in the presence of ArcA) (Table I) and pattern II (anaerobic activated gene expression, i.e. increased expression in the presence of ArcA) (Table II) are most easily reconciled with previous reports. Of the 94 genes of these patterns, 24 contain known or putative ArcA-binding site motifs. These results suggest that we might expect the total number of genes directly activated or repressed by ArcA to be in the range of 290 genes. Liu and De Wulf (22) estimated 372 genes.
Regulatory pattern III (anaerobically repressed, but not affected by ArcA) and pattern IV (anaerobically activated, but not affected by ArcA) are most easily explained as genes controlled by the FNR protein or by an as yet unidentified global regulator such as Lrp, IHF, FIS, or H-NS. Only two of these genes, nuoG and nuoF, are known members of the ArcA regulon.
As in our previous work (1), it is more difficult to understand Distance is upstream from the start codon of the gene or the first gene of the operon (if internal). Sites are predicted from the gene promoter. Other putative ArcA-binding sites may also be predicted upstream of a secondary promoter, but are not mentioned here. the physiological roles that the genes of regulatory patterns V-VIII might play in anaerobic metabolism. However, these genes are still members of the same functional classes regulated by FNR (1) and Lrp (25). To illustrate the overlap between genes regulated by ArcA, FNR, and Lrp, a Venn diagram was constructed (Fig. 7B). The 500 genes with the highest PPDE(p) values (Ͼ0.996232) and the lowest p values (Ͻ5.26E-04) obtained from the array experiments reported here comparing arcA isogenic strains under anaerobic growth conditions were compared with the 500 genes with the highest PPDE(p) values (Ͼ0.991) and the lowest p values (Ͻ0.0014) obtained from the array experiments reported here comparing fnr isogenic strains under anaerobic growth conditions compared with the highest PPDE(p) values (Ͼ0.80) and the lowest p values (Ͻ0.027) obtained from the Lrp array experiments comparing lrp isogenic strains under aerobic growth conditions (25). Among these three gene sets, 26 genes are present in all three, and 43 genes overlap between the ArcA and Lrp regulons. This further supports our previous suggestion (1) that the FNR, Lrp, and now ArcA regulons reveal overlapping functions under aerobic and anaerobic conditions.

Conclusion
In this, our fourth study of global gene expression profiling in E. coli K12, we have again employed rigorous statistical treatment of the data to infer differential expression for 1139 genes in the presence and absence of the ArcA regulatory protein. In agreement with our previous study on the FNR protein (1) and the study of Liu and De Wulf (22), these results demonstrate that the network of genes required for the transition of cells from aerobic to anaerobic growth conditions is much larger than previously suspected (ϳ8 -10-fold).
A total of 30 genes had been documented previously as members of the ArcA regulon (5,6). The study by Liu and De Wulf (22) suggested that 372 genes (or ϳ9% of the E. coli genome) are potential members of the ArcA regulon. The results presented here identify 135 of 175 genes with p values Ͻ0.000174 and PPDE(Ͻp) values Ͼ0.9994 whose expression is affected by ArcA. However, if we include all genes expressed at a level above the background and examine the PPDE versus p value plots, we have a 63% confidence level that any gene in our oxygen-regulated set is differentially expressed (1), i.e. 63% of the 2820 genes or ϳ1700 genes. In the same manner, using the same PPDE versus p value plots, 67% of these 1700 genes or 1139 genes are either directly or indirectly regulated by ArcA. Thus, these results greatly expand our knowledge of genes that compose the ArcA regulatory network.