Probing the ArcA-P Modulon of Escherichia coli by Whole Genome Transcriptional Analysis and Sequence Recognition Profiling*

The ArcB/ArcA two-component signal transduction system of Escherichia coli regulates gene expression in response to the redox conditions of growth. Over the years, genetic screens have lead to the identification of about 30 ArcA-P-controlled operons that are involved in redox metabolism. However, the discovery of 3 targets that are not implicated in respiratory metabolism (the tra operon for plasmid conjugation, psi site for Xer-based recombination, and oriC site for chromosome replication) suggests that the Arc modulon may comprise additional operons that are involved in a myriad of functions. To identify these operons, we derived the ArcA-P-dependent transcription profile of E. coli using oligonucleotide-based microarray analysis. The findings indicated that 9% of all open reading frames in E. coli are affected either directly or indirectly by ArcA-P. To identify which operons are under the direct control of ArcA-P, we developed the ArcA-P recognition weight matrix from footprinting data and used it to scan the genome, yielding an ArcA-P sequence affinity map. By overlaying both methods, we identified 55 new Arc-regulated operons that are implicated in energy metabolism, transport, survival, catabolism, and transcriptional regulation. The data also suggest that the Arc response pathway, which translates into a net global downscaling of gene expression, overlaps partly with the FNR regulatory network. A conservative but reasonable assessment is that the Arc pathway recruits 100–150 operons to mediate a role in cellular adaptation that is more extensive than hitherto anticipated.

In Escherichia coli, gene expression in response to changing respiratory conditions of growth is partially mediated by the Arc two-component signal transduction system (1)(2)(3)(4)(5)(6), which comprises the transmembrane ArcB sensor kinase and its cytosolic cognate response regulator ArcA (7,8). Under anaerobic or microaerobic conditions, ArcB undergoes autophosphorylation and then catalyzes the transphosphorylation of ArcA. Under aerobic conditions, oxidized forms of quinone electron carriers in the membrane inhibit the autophosphorylation of ArcB and therefore its mediation of the Arc metabolic response (9). Phospho-ArcA (ArcA-P) 1 represses certain target operons (e.g. glcDEFGB, Ref.  14). To date, some 30 operons are known to be controlled by ArcA-P, most of which are involved in respiratory metabolism (6).
The 7-bp sequence 5Ј-TATTTaa-3Ј (the lowercase letters are less-conserved nucleotides) was proposed as a putative signature for promoter recognition by ArcA-P, based on DNaseI protection experiments at the pflA promoter (14). A subsequent homology search that included ArcA-P-protected promoter regions of cydAB, pflA, gltA, lldPRD, sdhCDAB, and sodA, plus the entire promoter regions of 16 additional operons whose expressions are ArcA-P-controlled, led to the suggestion of 5Ј-nGTTAATTAn-3Ј (n is A or T) as the ArcA-P binding consensus (15). This 10-bp consensus proved useful for locating ArcA-P binding sites at two novel targets that are not involved in respiratory metabolism: the tra operon for conjugation of resistance plasmid R1 (16), and the psi site for Xer-based recombination in plasmid pSC101 (17). The identification of these functionally distinctive targets, and of the Arc-controlled chromosome replication site oriC (18), suggests that more (unexpected) operons may be under the transcriptional control of ArcA-P.
To discover these operons and possibly new functions of the Arc system that reach beyond redox metabolism, we undertook two complementary approaches. First, a profile of ArcA-Pdependent gene expression in E. coli was obtained using oligonucleotide-based microarray analysis. Second, an ArcA-P recognition weight matrix was derived from footprinted promoter sequences and used to screen the E. coli genome to locate potential ArcA-P binding sites. By combining both techniques we aimed to: (i) identify operons that are most likely under the direct control of ArcA-P, (ii) estimate conservatively the number of operons that are Arc-controlled, and (iii) reveal which physiological roles are governed by the Arc pathway in E. coli.

EXPERIMENTAL PROCEDURES
ArcA-P Weight Matrix Development and Screening-The method of Berg and von Hippel (19) was used to score the E. coli genome sequence with the ArcA-P weight matrix, which was developed from footprinted promoter regions of 11 ArcA-P-controlled operons (Table I) using the AlignACE program (20). The matrix-screening method predicts the affinity of ArcA-P for any genomic 15-bp DNA sequence, based on the sequence statistics of the input promoter regions. Both strands of the E. coli K-12 MG1655 genome sequence, obtained from GenBank TM entry U00096, were searched. Near-symmetric sites with high scores in both the forward and reverse direction were counted only once, and the higher of the two scores was used.
Strains and Growth Conditions-E. coli wild-type strain MC4100 (F Ϫ araD139 ⌬(argF-lac) U169 rpsL150 deoC1 relA1 thiA ptsF25 flb-5301 rbsR), and isogenic arcA::kan deletion strain ECL5331 were used in the experiments. The arcA::kan allele was constructed according to the method of Link et al. (21) and then P1 transduced into wild-type strain MC4100, yielding strain ECL5331. To obtain total RNA for use in microarray and quantitative real-time PCR analyses (RT-PCR), both strains were grown anaerobically in 400 ml of MOPS-buffered (100 mM, pH7.4) Luria-Bertani (LB) medium that was supplemented with 20 mM of D-xylose. The cells were grown at 37°C under cap-sealed conditions with constant stirring with a magnetic bar until an optical density (OD 600 ) of 0.3-0.4 was reached.
Total RNA Isolation-Upon reaching an OD 600 of 0.3-0.4, nine independent cultures of the wild-type and mutant strains were pooled, yielding three sets of three cultures per strain (the experimental setup is detailed in Fig. 4). The cultures were cooled down with shaved ice, harvested, and subjected to total RNA isolation using the hot phenol extraction method (22). The obtained RNA was treated with RNase-free DNase I (Invitrogen) in the presence of RNA Recombinant Ribonuclease Inhibitor (RNAout, Invitrogen). The absence of contaminant DNA was then confirmed by RT-PCR, and the quality of the RNA examined using agarose gel electrophoresis. The RNA concentrations were measured at OD 260 , and the three independent batches of RNA (for both strains) were aliquoted and stored at Ϫ80°C for use in experiments.
Quantitative Analysis of Gene Expression by Real-Time PCR of Reverse Transcribed RNA-cDNA was first synthesized from 2 g of total RNA using the SuperScript TM First-Strand Synthesis System (Invitrogen) according to the manufacturer's instructions. RT-PCR amplification of the cDNAs (50°C, 2 min; 95°C, 10 min; 95°C, 15 s; 60°C, 1 min; 40 cycles) was performed with SYBR Green I Dye in a reaction mixture containing the SYBR PCR Master Mix (Applied Biosystems), 1 pM of the forward and reverse primers, DEPC-treated water, and either cDNA or genomic DNA (25-l reaction volume). The dye-labeled PCR products were quantified with a 7700 Sequence Detector (Applied Biosystems). A standard curve for normalizing the concentration of cDNAs was generated by amplifying envZ from genomic DNA of the wild-type strain, because envZ transcription is not subject to ArcA-P control (data not shown).
Analysis of Gene Expression Using Oligonucleotide-based Affymetrix GeneChip® Microarrays-cDNA for microarray experiments was synthesized from 5-15 g of total RNA using the SuperScript Double-Stranded cDNA Synthesis kit (Invitrogen) and random primers (Promega) according to the manufacturers' instructions. The RNA template was removed from the cDNA by adding 1 N NaOH, followed by cDNA purification using QIAquick columns (Qiagen). 3-7 g of cDNA were next fragmented with DNase I (Promega) and labeled (1 h, 37°C) with biotinylated dd-UTP using the Bioarray™ Terminal Labeling kit (Enzo). The labeled cDNA fragments were next hybridized (16 h, 45°C) to E. coli Antisense Genome Oligonucleotide Arrays (Affymetrix). Arrays were washed at 25°C with 6ϫ SSPE buffer (900 mM NaCl, 60 mM NaH 2 PO 4 , 6 mM EDTA ϩ 0.01% Tween 20), followed by a stringent wash at 50°C with 100 mM MES, 100 mM NaCl, and 0.01% Tween 20. The microarrays were then stained with phycoerythrein-conjugated streptavidin (Molecular Probes), and the fluorescence intensities measured after laser confocal scanning (Hewlett-Packard) with Microarray software (Affymetrix). Sample loading and variations in staining were standardized by scaling the average of the fluorescent intensities of all genes to a constant target intensity of 250 for all arrays used. The signal intensity for each open reading frame (ORF) was calculated as the average intensity difference, represented by [⌺(PM-MM)/(number of probe pairs)], where PM and MM denote perfect match and mismatch probes.
Statistical Analysis of Microarray Profiles-To determine the reproducibility of the three independent microarray analyses that were performed per strain (Fig. 4), the ORF signals derived from each microarray were correlated using Spotfire DecisionSite (Spotfire Inc.). Because these correlations were excellent (Fig. 6, A-F), we averaged the triple data sets for both the wild-type and mutant strains. To extract statistically relevant information from the data suites, the ORF hybridization signals were passed through three steps of data filtration: (i) a coefficient of variation Ͻ0.8; (ii) mutant-to-wild type signal ratio larger than 2, positive or negative (log 2 [MT:WT] Ͼ Ϯ1); and (iii) signal ratios with a p Ͻ 0.05 (Student's t test). The filtered signals were then ascribed to ORFs whose expressions are (in)directly affected by ArcA-P.

RESULTS
Derivation of the ArcA-P Weight Matrix for Sequence Recognition Profiling-Support for ArcA-P binding at the consensus sequence 5Ј-nGTTAATTAn-3Ј (n is A or T, Ref. 16) was previously obtained from footprinting and site-directed mutagenesis experiments at the aldA promoter region (23). Nonetheless, the palindromic character of the sequence and its high A/T-content make the identification of functional ArcA-P binding sites difficult in A/T-rich promoter regions. In order to refine the recognition consensus, we searched for a common motif within ten ArcA-P footprinted promoter regions (400-bp upstream to 100-bp downstream of the start codon) of eleven ArcA-P-controlled operons ( Table I). The screen was performed with the program AlignACE (20), which uses the Gibbs sampler algorithm (24). The analyzed operons were aldA (23), cydAB (1), glcDEFGB (10), gltA and sdhCDAB (4,15,25), icdA (11), lld-PRD (4), lpdA (26), pflA (14), sodA (13), and traY (16) ( Table I).
Highly conserved stretches of 15 base pairs were found in the footprinted regions of all input promoters. By determining the base frequency at each position, we converted this sequence into a weight matrix (Fig. 1). The sequence of most conserved bases, 5Ј-GTTAATTAAATGTTA-3Ј, resembles the previous 10-bp consensus (5Ј-nGTTAATTAn-3Ј, n is A or T). However, the first nucleotide of the consensus (5Ј-A/T) turned out to be poorly conserved and is not included in the present motif. On the other hand, the motif is extended by 5 residues at the 3Ј-end.
Screening the E. coli Genome with the ArcA-P Recognition Weight Matrix-Each of the 4,639,206 successive 15-bp stretches of the E. coli K-12 MG1655 genome was next scored on both strands for its degree of matching with the ArcA-P weight matrix using the log transformation method of Berg and von Hippel (19). The distribution of the scores of all potential ArcA-P recognition sites was determined and the average of these scores, i.e. the genomic mean, assigned a Z score of 0. A   (20) in the 10 promoter regions of 11 footprinted ArcA-Pcontrolled operons (27). The base frequency at each nucleotide position within the 15-bp sites was used to calculate the ArcA-P recognition weight matrix (Fig. 1).

Operon
Putative ArcA-P binding site a Binding site is located in the promoter region shared by the oppositely transcribed gltA and sdhCDAB operons.
histogram window comprising the best scoring sites (Z Ն ϩ3.5) is shown in Fig. 2A. As one would expect, all sites in the promoter regions of the eleven input operons were identified (Table I, operons are marked with an arrow in Fig. 2A), because the probability of being a true ArcA-P regulatory site statistically increases with an increasing Z score. The mean score i of the ArcA-P recognition sites in front of the 11 input operons has a Z value of ϩ4.42 ( Fig. 2A). We next determined the standard deviation, i , of these input sites and ranked the genomic hits with a Z value Ն ϩ3.5 into four standard deviation groups relative to i (Fig. 2, A and B). The first standard deviation group contains potential ArcA-P binding sites that score above i ϩ 1 i (Fig. 2, A and B; Table II, rows 1-3). This group comprises only three sites (Fig. 2, A and B, bars), all of which are intergenically located (Fig. 2B, circles) upstream of lldPRD (matrix score: i ϩ 1.37 i ), cydAB ( i ϩ 1.37 i ), and sodA ( i ϩ 1.17 i ). These three input operons were used to create the weight matrix (Table I). The second standard deviation group (Fig. 2, A and B and Table II, rows 4 -15) contains 12 potential ArcA-P recognition sites that score between i and i ϩ 1 i (Fig. 2B, bars). Among these, 11 sites (or 92%) are intergenically located immediately upstream of an open reading frame (Fig. 2B, circles). Only the site upstream of xylH is located intragenically as it covers the coding sequence of preceding xylG. This matrix site, however, may well be functional as it covers the mRNA sequence of the xylFGHR operon. Of the 11 intergenic sites, 4 are localized in front of input operons: traY ( i ϩ 0.79 i ), cydAB ( i ϩ 0.54 i ), glcDEFGB ( i ϩ 0.42 i ), and pflA ( i ϩ 0.05 i ), as well as sites in front of operons that are not connected with respiratory metabolism such as cst/xthA ( i ϩ 0.40 i ) and insA_5/uspC ( i ϩ 0.02 i ) ( Table II). The third standard deviation group comprises 193 putative ArcA-P binding sites that score between i and i Ϫ 1 i (Fig. 2B, bars). Among these, 132 sites (or 68%) are located in intergenic regions ( Fig. 2B, circles). Representatives include 5 sites located in front of input operons aldA 2A). The fourth standard deviation group comprises 1,122 potential ArcA-P boxes that score between i Ϫ 1 i and i Ϫ 2 i (Fig. 2B, bars). Among these, 606 sites (or 54%) are located in intergenic regions (Fig. 2B, circles). Members include 7 sites found in the promoter regions of input operons cydAB In total, 752 intergenic sites score higher than i Ϫ 2 i . Interestingly, a positive first order relationship (r 2 : 0.99) was found between the matrix score of a site and the probability of it being located intergenically (Fig. 2B, circles). Even more, the hits scoring above i Ϫ 2 i appear to be located especially in sequence regions Յ 200-bp upstream from start codons (Fig.  3A). The matrix, derived from footprinting data, therefore seems to recognize high scoring sites in potential promoter regions, suggesting that most identified sites are very likely to be active ArcA-P binding sites. However, a number of intragenically located sites are expected to be true ArcA-P binding sites (e.g. sites located in front of or covering long transcripts) as both trends (Figs. 2B and 3A) may in part reflect the A/Trich bias of both the ArcA-P recognition sequence and promoter regions. An in-depth discussion of the significance of weight matrix discrimination between non-coding and coding sequences in E. coli is found in Robison et al. (28).
Putative target operons were also classified based on the number of ArcA-P matrix hits (scoring above i Ϫ 2 i ) that are clustered in their promoter regions (Fig. 3B, bars): 148 promoter regions harbor two ArcA-P boxes, and 74 contain three boxes or more. The observed increase in probability that multiple boxes are tightly grouped in intergenic regions (r 2 : 0.98; Fig. 3B, circles) adds to the suggestion that more than the 30 currently known operons may be subject to Arc regulation. Clustered ArcA-P boxes are on average separated by 15 bp, indicating that ArcA-P may act multimerically at some of its target regions. This observation supports the recent finding that ArcA-P may multimerize prior to DNA binding (29).
Measuring the Effect of ArcA-P on Gene Expression using Microarrays-In order to identify which E. coli genes are under the regulatory control of ArcA-P, we compared the transcription profile of a wild-type strain to that of an isogenic arcA deletion strain (the experimental setup is detailed in Fig. 4). For this purpose, both strains were cultured anaerobically be- Score-based distribution of matrix-identified ArcA-P recognition sites. A, distribution of genomic matrix-identified ArcA-P recognition sites that score above Z ϭ ϩ3.5. The genomic Z score represents the number of standard deviations above the genomic mean of Z ϭ 0. i denotes the mean Z score of the ArcA-P binding sites present in the promoter regions of the eleven ArcA-P-controlled input operons (Table  I). i denotes the standard deviation below or above i . The openhead arrows indicate the Z-score positions of the recognition sites used to create the ArcA-P weight matrix. The gray arrow designates the potential genomic Z-score position of traY, an Arc-controlled input operon that is plasmid encoded. B, classification of high scoring matrix hits based on their matrix score and their probability of being located intergenically.
cause only under these conditions does ArcB phosphorylate ArcA. ArcA-P then proceeds to activate or repress its target operons. Furthermore, the growth medium was free of D-glucose (D-xylose as the carbon source) and buffered with MOPS at pH 7.4 to avoid glucose-based catabolite repression or indirect pH effects on gene expression. For each strain, nine independent cultures were grown. These were then pooled into three groups of three, followed by total RNA extraction (Fig. 4). The resulting sets of RNA (three for each strain) were then subjected to reverse transcription and RT-PCR to determine the expression states of sdhCDAB and cydAB, two operons known to be under the negative and positive control of ArcA-P, respectively (1,4,25). The results showed an 11-fold negative effect of ArcA-P on sdhCDAB expression, and a 3-fold positive effect on cydAB (Fig. 5A), corroborating the experimental design and quality of the isolated RNAs. For each strain, the three sets of reversely transcribed RNAs were then hybridized onto E. coli Antisense Genome Oligonucleotide Arrays® (Affymetrix) (Fig.   4). A total of 7,312 hybridization signals (4,345 ORF signals; 2,886 intergenic signals; and 81 internal control signals) were obtained from each array (Fig. 4). When the expression profiles from the three arcA⌬ arrays (designated MT1, MT2, and MT3) were plotted against each other, intercorrelation coefficients of 0.81, 0.77, and 0.75 were obtained (Fig. 6, A-C). Similarly, the expression profiles from the wild-type strain (designated WT1, WT2, and WT3) gave intercorrelation coefficients of 0.86, 0.94, and 0.86 (Fig. 6, D-F), showing excellent degrees of reproducibility. Consequently, we averaged for each strain the triple data sets, yielding a final expression profile for the mutant and wild-type strain. Before deriving biologically relevant information from these profiles, we validated further the microarray data by analyzing, using RT-PCR, the expression states of 15 additional operons (metR, csgD, rhsD, tolB, fur, dcm, rpoE,  phoE, fimB, ompT, secA, ompF, phoH, lon, and ompC; Fig. 5A) in the arcA⌬ and arcAϩ strains. These operons were selected based solely on the scores of their ArcA-P recognition sites, a Location of the box is in bp upstream of the start codon. b ArcA-P-controlled operon whose footprinted promoter sequence was used to construct the weight matrix. c The single box is located between 2 oppositely transcribed operons. Its location is in bp from the start codon of each operon. A negative location means that the matrix site is positioned downstream of the start codon within the coding sequence of the candidate target operon. B, classification of high scoring matrix hits clustered in promoter regions (bars), and on the probability that these clusters are located intergenically (circles). such that they represent all candidate target operons that score above i Ϫ 2 i . Under the implemented non-induced growth conditions, only fimB (encoding the switcher of Type 1 fimbriae) was identified as a potential Arc-controlled target (Fig.  5A). When the RT-PCR expression data were plotted against the data obtained using microarrays, an excellent degree of correlation (r 2 : 0.91) was obtained, validating our microarray transcription profiles (data not shown). Quantitative RT-PCR appeared to be slightly more sensitive (ϳ1.5-fold, data not shown) than microarray-based hybridization (30,31), attesting that our microarray approach represents a really stringent means for identifying Arc-controlled operons.
To determine which operons are affected by the Arc signal transduction pathway, we next subjected the arcA⌬ and arcA ϩ array-transcription profiles to three steps of data filtration (Fig. 4). First, the results were subject to a coefficient of variation (C.V.) cut-off of 0.8, thereby eliminating 14% of the ORF signals. Second, a noise-to-signal ratio of log 2 [Ϯ1] was installed  on the arcA⌬/arcA ϩ expression data, resulting in an additional removal of 82% of the ORF signals. Third, a Student's t test with a p Ͻ 0.05 was run on the remaining ORF signals, yielding a final set of 372 open reading frames (or 9% of the initial ORF hybridization signals) that are significantly affected, directly or indirectly, by ArcA-P (Fig. 7).
Matching the Matrix Screening Data to the Micrarray Data Reveals New Members of the Arc Modulon-To determine which microarray-identified ORFs may be under the direct control of ArcA-P, we overlaid the filtered microarray data set with the matrix data set (Fig. 4). Of the 372 Arc-affected ORFs, 58 (or 16%) contained one or multiple matrix box(es) scoring above i Ϫ 2 i (Table III). Among these, we recognized seven input operons (aldA, cydAB, glcDEFGB, gltA, icdA, lldPRD, and sdhCDAB), validating the methodology and outcome under the implemented experimental conditions. As a consequence, 51 operons (38 repressed, 13 activated) encoding a broad variety of functions (Tables III and IV) emerged a potentially new members of the Arc modulon.
As microarray-based profiling is environment-dependent, and matrix screening is not, it is clear that the microarray data set of Arc-controlled targets is incomplete. In fact, some known ArcA-P regulated operons (e.g. cyoABCD and sodA, the latter was used to construct the matrix), as well as operons containing top-scoring ArcA-P recognition sites (designated by N.K. in Table II, column 5) were not identified in our microarray data set. This is because some of these operons are transcribed only upon external induction (e. g. cyoABCD and sodA). This may be true for numerous other operons whose induction conditions are unknown, leading to an underreckoning of the Arc modulon. Fortuitously, the matrix-screening profile (available at  arep.med.harvard.edu/ecoli_matrices) may give us a clue as to the identity of these cryptic operons. To illustrate the utility of the matrix in this respect, we grew the arcA ϩ and arcA⌬ strains anaerobically in minimal medium supplemented with 5% casein amino acids. RT-PCR analysis was performed on operons whose expression states were analyzed in cells grown in LB/xylose (Fig. 5A). In the presence of casein amino acids (Fig. 5B), csgD (regulator of curli biosynthesis, i Ϫ 1.87 i ) and ompT (outer membrane endoprotease, i Ϫ 1.06 i ) were respectively 17-fold and 15-fold activated by ArcA-P, whereas fur (regulator of ferric uptake, i Ϫ 1.61 i ) and fimB (switcher of Type 1 fimbriae, i Ϫ 1.13 i ) were, respectively, 7-fold and 9-fold repressed in an ArcA-P-dependent manner. The Arc modulon thus may contain beyond 80 members. Based on the current data, a conservative but reasonable assessment is that the Arc response pathway may recruit 100 -150 operons. Modulon subsets are triggered or repressed depending on the growth conditions.

DISCUSSION
Effectiveness of ArcA-P Recognition Weight Matrix Screening-Matrix-based genome analysis has proven to be very meaningful to identify truly functional regulator binding sites in the E. coli chromosome (27,28,32). The sdhCDAB promoter, for instance, contains three ArcA-P protected segments as determined by footprinting experiments (15,25). However, only one segment, covering the Ϫ35 to Ϫ10 region, was shown to be functional by genetic analysis (25). By our matrix criterion, this short region contains two ArcA-P boxes ( i Ϫ 1.67 i , i Ϫ 1.51 i ) (Table I and Fig. 2A), whereas the other two footprinted, non-functional regions contain none. In case of the pflA promoter region, four distinct ArcA-P protected segments were found (14). By our weight matrix criterion, two ArcA-P boxes ( i ϩ 0.05i, i Ϫ 1.53i) ( Table I and Fig. 2A) were identified only in the one region that showed the highest in vitro affinity for ArcA-P. One reason why ArcA-P matrix scanning is so reliable is that the matrix was derived from footprinting experiments. As was shown for the CpxR-P weight matrix, one can anticipate a strong correlation between the matrix score of an ArcA-P recognition site and the in vitro affinity of ArcA-P for that site (32).
The advantage of using the weighted ArcA-P matrix ( Fig. 1) over the rigid ArcA-P consensus is illustrated by the 46,943 genomic hits that were obtained using the consensus (data not shown), whereas the matrix recognized 1,333 statistically significant binding sites scoring above i Ϫ 2 i (Fig. 2, A and B). Importantly, all ArcA-P-controlled operons used to create the matrix are distributed uniformly throughout this Z-score window ( Fig. 2A). Another indication of the soundness of the matrix screening is the increase in the percentage of ArcA-P recognition sites that are located intergenically with an increasing Z score (Fig. 2B). However, not all of the targets scoring above the cut-off level can be expected to be true binding sites as the set of eleven input operons may not accurately reflect the statistical distribution of all true targets.
Probing the Arc Modulon by Superimposing Matrix-and Microarray-derived Data-To identify true ArcA-P target operons, we complemented the matrix screening with array-based expression analysis. For this purpose, arcA ϩ and arcA⌬ strains were grown anaerobically to activate the Arc signaling pathway. Because growth under aerobic conditions does not trigger the Arc response, Arc-controlled targets should not be identified under such conditions (33). Total RNA was isolated from the anaerobic arcA ϩ and arcA⌬ cultures, converted to cDNA, and hybridized to oligonucleotide-based microarrays. A total of 4,345 ORFs (Fig. 7, open bars) were obtained but data filtration analysis suggested that only 372 ORFs (Fig. 7, filled bars) were signigifcantly affected by ArcA-P. Of these, 234 ORFs (or 63%) are repressed, whereas 137 are activated. This finding shows that 9% of all ORFs in E. coli (372/4,345) are affected, either directly or indirectly, by ArcA-P and that the Arc-mediated response translates into a net global downscaling of gene expression.
To determine which operons are under the direct control of ArcA-P, we matched the matrix-screening data with the arrayderived expression profiles. 58 of the 372 ArcA-P affected operons contained one or more matrix boxes upstream of the start codon, strongly suggesting a direct involvement of ArcA-P in their expression. Importantly, these boxes (marked with an arrow in Fig. 8) cover the window of statistically significant targets (scoring above i Ϫ 2 i ) and overlie the positions of the matrix sites located in the promoters of the eleven input operons (compare Fig. 8 with Fig. 2A). In addition, expression analysis under a different growth condition (casein amino acid medium versus D-xylose in LB) revealed an additional four operons, with high-scoring matrix sites, that are most likely under the direct control of ArcA-P. The identification of fur (encoding a regulator of ferric uptake) as an ArcA-P target, is particularly worthy of note as it regulates the expression of sodA (13). sodA, encoding the manganese-containing superoxide dismutase, is induced under conditions of iron deprivation and is also under the transcriptional control of ArcA-P and a Functional annotations follow Blattner et al. (35). b Operons whose transcription is significantly affected by ArcA-P as measured by microarray analysis. The operons were identified after statistical filtration of the dataset (2-fold expression cut off, coefficient of variation Ͻ0.8, and Student's t test, p Ͻ 0.05).
c Operons affected by ArcA-P (see microarray column) that contain in their promoter sequence an ArcA-P recognition site that scores above i -2 i .
FIG. 8. Score-based distribution of ArcA-P-controlled operons identified in this study. Score-based distribution of ArcA-P recognition sites identified in promoter regions of operons whose expression is most likely under the direct control of ArcA-P. FNR (34), suggesting that the actions of Fur, ArcA-P, and FNR are carefully orchestrated during the adaptation to iron deprivation.
Computational analysis of the 62 Arc-controlled operons did not reveal any correlation between the score of a matrix site or its position relative to the start codon in an upstream sequence region, and the degree or type of Arc regulation at that site. Weight matrix profiling alone therefore does not yield decisive information about Arc activity at matrix-identified operons.
ArcA-P Controls a Functionally Diverse Set of Operons-The results described above illustrate that ArcA-P controls a myriad of operons that are involved in a varity of functions. Categorizing the newly identified operons into 21 functional groups (Table IV) according to Blattner et al. (35) shows that the Arc adaptational response extends beyond redox metabolism: fliMN and fliE encode proteins involved in flagella synthesis and switching, respectively; ftsZ encodes the protein that forms the cell division ring, surA encodes a protein that mediates stress-induced survival, nikABCDE encodes the nickel transport system, and atoSC is involved in short-chain fatty acid degradation. This broader than expected role is further supported by the recent findings that ArcA-P modulates the expression of virulence factors in Vibrio cholerae (36) and of membrane proteins in Haemophilus influenzae that are implicated in serum resistance (37). The disparate functions of the newly identified operons suggests that their organization into a single Arc modulon somehow must have created a selective advantage, resulting in a more competent response under suboptimal respiratory growth conditions. The Arc-activated cydAB (1), and Arc-repressed sodA (13) and lpdA (26), are also under the direct negative control of FNR, a regulator that co-mediates gene expression in response to the redox conditions of growth (38). The identification by matrix screening and microarray analysis of caiT, ndh, nikA, manX, and sdaC as ArcA-P regulated targets (Table III) extends the overlap between the Arc and FNR networks as these operons were recently shown to be under the direct control of FNR (38). This integration of operons and response circuits seems to be a prevailing denominator as contemporary systems-biology approaches are consistently proving that regulatory pathways intertwine (e.g. the CpxA/CpxR, E  In conclusion, our systems approach serves as a first step toward the long term goal of establishing a functional map of the Arc system, which appears more multifaceted than anticipated. Our findings extend the Arc modulon to beyond 80 members (30 known plus 55 novel operons), but for reasons acknowledged earlier, this number likely is a underestimation. Ad interim, a conservative but reasonable assessment is that around 100 -150 operons may be under transcriptional control of ArcA-P. Further molecular characterization of candidate Arc-regulated targets identified in this study will ultimately reveal how and when the Arc system functions in coordinating cellular adaptation in situations of environmental adversity.