Modulation of Gene Expression Regulated by the Transcription Factor NF-κB/RelA

Background: Interacting proteins modulate the activity of NF-κB/RelA transcription factor and expression of its targets. Results: By analyzing gene expression, protein binding and DNA binding, we inferred and characterized 8349 such modulations. Conclusion: Different modulator groups affect separate pathways. Significance: We provide new insight into the activity of NF-κB/RelA. Our inference model can be applied to other processes and pathways. ABSTRACT Modulators (M) are proteins that modify activity of transcription factors (TFs) and influence expression of their target genes (TGs). To discover modulators of NF-κB/RelA, we first identified 365 NF-κB/RelA binding proteins using Liquid Chromatography/ Tandem Mass Spectrometry (LC-MS/MS). We employed a probabilistic model to infer 8,349 (M, NF-κB/RelA, TG) triplets and their modes

RelA (p65) is one of five components of the NF-κB transcription factor complex in mammals (10). Due to the importance and complexity of the cellular processes regulated by NF-κB pathways, elucidating and characterizing modulators of NF-κB will provide important insights into the role of modulation in fine-tuning transcriptional regulations and conferring specificity, as well as identify potentially promising drug targets.
Gene expression profiles measured under various conditions provide important information on the regulatory relations between genes. Gene regulatory networks have been constructed using reverse engineering methods based on correlation of gene expression profiles (13). These methods include conditional information (13,14), correlation coefficient, machine learning (15) and statistical models (16). Recently, heterogeneous data integration of promoter sequence and gene expression profiles has also been used to infer gene regulation patterns (17,18).
Modulators (M) are proteins that modify the activity of a transcription factor (TF). Modulators often act by binding directly to the TFs, to affect the expression of a target gene. The outcome of modulation can be classified into the six possible action modes: inhibition attenuation, inhibition enhancement, inhibition inversion, activation inversion, activation enhancement, and activation attenuation (19). Including analysis of the modulators facilitates elucidation of a biologically realistic regulatory network. Therefore, instead of pairs (TF, TG) that lead to traditional regulatory networks, triplets (M, TF, TG) and the more complex regulatory and modulatory networks inferred based on significant triplets should be considered.
Several computational methods for inferring (M, TF, TG) triplets from gene expression profiles have been developed in recent years, including three-way mutual information (20), MINDy (21), GEM (19), MIMOSA (22), and MONSTER (23). MINDy uses the difference between conditional mutual information at low and high modulator expression, and has been applied to identify genome wide modulators of MYC. However, it has a major limitation in that it cannot identify modulators with more than one mode of activity (activator or repressor). MIMOSA applies a mixture model and maximum expectation method to predict modulators of STAT1. The method is computationally intensive and cannot differentiate between modulators and targets of the TF. Finally, MONSTER only considers kinases as potential modulator candidates. Neither MIMOSA, nor MONSTER, has been experimentally validated. GEM (19) aims to identify modulators of the androgen receptor, AR, by constructing a nonlinear model of the three-way interaction between the activity levels of the two "inputs", (TF and M), and the "output" (TG).
In this work, we predicted modulators and the corresponding TGs of the NF-κB/RelA transcription factor, based on the approach in (19). As input for modulatory triplet inference, we used de novo identified RelA binding proteins and its target genes, identified respectively by affinity Here, and are the transcription factor, its modulator and the affected target gene, respectively.
is the basal expression level of the TG, and represent the effect of and alone on , respectively. represents the effect of interaction between and on and varies for different genes. Its magnitude may be interpreted as the modulation effect. Our task was to find genes with G significantly different from zero. Accordingly, we approximated the probability that TG is highly expressed, , as: where represents the basal expression of , is the effect of only (when low), is the effect of only (at low ) and is the interaction effect of and when both are highly expressed.
and are equal to 1 if the ranked expression of the corresponding gene is in the upper tertile. Conversely, if it is in the lower tertile, and are set to zero. If is not statistically significantly different from zero, the data do not have enough evidence to support the hypothesis that modulates the effect of on the target . To estimate , we first calculated , the proportions of with different combination of states of and . Specifically, , where with and , serves as the observed case, while , and are the respective control cases with low or or both. were then estimated by ̂ Here, ̂ corresponds to the proportion of to plus when both and is low expressed.
Estimation of the above parameters can be found in (19). Assuming that influences via , must act on when is highly expressed, i.e., γ and β must be non-zero, while β is greater in absolute value or has a different sign than α ̂ ̂ ̂ ̂ The estimation of the variance and p-value for rejecting the above three null hypotheses can be found in (19). The predicted triplets are ordered according to the p-values of ̂ and ̂ , the smaller the p-value, the higher the significance of the predicted triplet. The requirement for a significant ̂ is crucial for detecting the interaction effect as opposed to a combination of independent actions of M and F separately.
positive, inhibition if negative and inactive if zero. By comparing ̂ and ̂, modulators are also classified into three categories of action: enhance, attenuate or invert the activity of the TF. Specifically, if ̂ is non-negative and ̂ value is greater than ̂ , the action mode is activation enhancement. If ̂ is positive and ̂ less than ̂ and non-negative, it signifies activation attenuation, and if ̂ is positive and ̂ is negative, activation inversion. If ̂ is non-positive and ̂ value is smaller than ̂ , the action mode is inhibition enhancement. If ̂ is negative and ̂ greater than ̂ and non-positive, the action mode is inhibition attenuation, and if ̂ is negative and ̂ is positive, inhibition inversion (see Table I).

A pass-strategy based on Shannon entropy filter
We further filtered the predicted modulators by the Shannon entropy of the distribution across the six action modes by each modulator respectively predicted from binding data. Modulators with Shannon entropy significantly smaller than random ones were classified as specific. The null distribution of the Shannon entropy depends on the number of targets and is obtained by generating the six action modes given the number of TGs in a simulation repeated 10,000 times. The modulators were ranked according to the largest number of TGs. The calculation of the Shannon entropy of the action mode distribution for each modulator is as follows: where is the proportion of the number of targets in each action mode , and is the entropy of action mode distribution for each modulator.

Hierarchical clustering of M-TG networks
To identify the modules of the network composed by RelA modulators and their affected target genes, we applied hierarchical clustering (24) to matrices of modulator-target interactions. Hierarchical clustering has been chosen to reflect the coincidence between the modular structure of the modulation network and the multiple levels of functional annotations of the genes involved (see also Results). The matrix entries correspond to the value of , which represents the effect of an M when F is highly expressed, and , the interaction effect of M and F when both are highly expressed.
Here we used the Matlab clustergram function with default parameters and parsed the M-TG network into a hierarchy of groups according to the similarity of the value of (the effect of M-F interaction) and the value of (the effect of M when F is high) that measures the common neighbors of M and T, respectively. The hierarchical clustering is agglomerative since we proceeded through the algorithm by adding links to the network. We applied the clustering procedure to triplets satisfying (-log10 p-value of ) > 4 and (-log10 p-value of ) > 4 . To obtain the clustering presented in Fig. 4, we only used the modulators and TGs involved in significant predicted interactions and not involved in feedback loops. Specifically, we only used the 861 TGs for which there was at least one triplet satisfying the p-value thresholds. The modulators have been selected analogously, with at least one significant triplet, giving 493 candidate modulators of RelA.

Gene expression profiles
We used gene expression profiles of 2158 tumor samples published by expO (expression project for oncology) to characterize each gene. ExpO is based on the GeneChip Human Genome U133 Plus 2.0 array platform. The variety of the tumor samples guarantees enough perturbation for studying three-point (M-F-T) interaction.
Since gene expression is noisy, as reported in (25) discretized the expression values by rank-ordering  across genes and dividing the ranked 2158  expression values of each gene across experiments  into three bins, labeled as "1" (upper 719  experiments where the gene ranked highest), "0"  (lower 719 experiments where the gene ranked  lowest) and "middle" (720 remaining experiments). Based on the areas under ROC and Precision-recall curve in predicting RelA modulators (see Figure 1), we found that varying the bin size only weakly affected the prediction of RelA triplets. Thus, we used tertiles as bins. Unless specified otherwise, we predicted triplets from probeset profiles with the above discretization. For some of the genes, multiple probesets are available on the UG133Plus2 microarray. Since the individual probesets may correspond to different isoforms of the gene and thus be responsible for different functions, we treated each probeset of modulators and target genes separately in the prediction algorithm. Also, there are three probesets corresponding to RelA on the array platform: 201783_s_at, 209878_s_at and 230202_at. We considered the first two individually and ignored the third one, 230202_at, because of its low expression in the expO experiments.

Collection of binding proteins and target genes of RelA from literature
We have obtained the list of RelA-binding proteins from the PIANA database (http://sbi.imim.es/piana/). 287 of those 297 proteins (Supplementary Data File S1, sheet 1) have probesets available in the expO database. Here, we refer to these previously reported binding proteins as "existing", as opposed to the newly identified proteins that bind to RelA. Similarly, the existing RelA targets (TGs) were collected from the Boston University website (http://www.bu.edu/nf-kb/). 424 TGs (Supplementary Data File S2, sheet 2) have been obtained with available probesets and expression values in the expO database. These existing binding proteins and TGs, combined with the newly detected ones, serve as our primary set of candidate genes involved in modulatory network of RelA regulated transcription. The functional annotations of the modulators and TGs were characterized using BiNGO software (26) in Cytoscape (27).

Identification of RelA binding protein by affinity tandem mass spectrometry
Material ─ All reagents were American Chemical Society (ACS) grade or higher. All solvents used, including water, methanol, acetonitrile (ACN) were LC/MS grade. Sequence grade modified trypsin was purchased from Promega (Madison, WI). Recombinant TNFα was from PeproTek (Rocky Hill, NJ). Antibodies used were: rabbit anti NF-κB/RelA (sc-372) Abs from Santa Cruz Biotech (Santa Cruz, CA).
On-beads Tryptic digestion-The beads were resuspended in 30 µL of 50 mM ammonium hydrogen carbonate (pH 7.8). 20 µL of 0.1 µg/ µL of trypsin was added. The samples were mixed and trypsinized by gentle shaking overnight at 37 °C. After digestion, the supernatant was collected. The beads were washed with 50 µL of 50% ACN three times and the supernatant was pooled, and dried [4].
In-gel Tryptic digestion ─ The protein mixtures obtained from immunoprecipitation of anti-RelA antibody and nonspecific IgG were separated on an SDS-PAGE gel. The proteins were visualized by Sypro® Ruby staining. The protein bands were cut from the gel and subjected to in-gel trypsin digestion as described previously [5]. Briefly, the gel slice was cut into small particles ( 1 mm 3 ) using a scalpel. The resulting gel particles were destained in 1 mL of water/methanol solution (50:50, v/v) containing 25 mM NH 4 HCO 3 (pH 8.0) three times, with the solution changed every 10 min. The destained gel was washed in 1 mL of an acetic solution (acetic acid/methanol/water, 10:40:50, v/v/v) for 3 h, with the solution changed every 1 h. The resulting gel was soaked in 1 mL of water twice with the solvent changed every 20 min. The gel was then transferred to a 0.5-mL microcentrifuge tube and dehydrated by soaking the gel in 100% ACN until it became opaque white. The solution was removed and the gel was dried in a Speed-Vac for 20−30 min. The dried gel was rehydrated with an adequate amount of trypsin digestion solution (10 ng of trypsin/μL in 50 mM NH 4 HCO 3 , pH 8.0). The digestion was carried out at 37 °C overnight. To extract tryptic digest, the gel was soaked in 40 μL of extraction solution (ACN/TFA/water, 50:5:45, v/v/v) for 60 min with a vortex and the extraction solution was carefully removed with a gel-loading pipet tip. Extraction was repeated once. The extracts were pooled and dried with a SpeedVac.
LC-MS/MS ─ HPLC/MS/MS analyses were performed in an LTQ velos Orbitrap mass spectrometer (Thermo Scientific, San Jose, CA) coupled on-line to a nano-HPLC system (Agilent 1100 Nano Pump, Agilent Technologies, San Jose, CA) and nanospray source. Three microliters of the peptide solution in buffer A (5% acetonitrile/94.9% water/0.1% acetic acid (v/v/v)) was manually injected and separated in a nano-HPLC C18 column (100 mm length × 75 μm inner diameter, 5-μm particle size), 100-Å pore diameter) The peptides were eluted from the column with a linear gradient of 5−70% buffer B (99.9% acetonitrile/0.1% acetic acid (v/v/v)) in buffer A over 45 min. The eluted peptides were electrosprayed directly into the mass spectrometer. The MS/MS spectra were acquired in a datadependent mode. The full MS scan was performed in Orbitrap with mass resolution of 60,000. The six strongest ions in each MS spectrum were automatically selected for CID and analyzed by LTQ velos.
Protein identification and spectral counting ─ The resulting spectra were used to identify protein candidates in the target-decoy Swissprot human protein database (downloaded on August 25 th , 2011) with the Proteome Discoverer 1.1 search engine (Thermo Scientific, San Jose, CA). The cutoff of the false discovery rate (FDR) of peptide identifications was 1%. Normalized spectral abundance factor (NSAF) value for each protein was calculated as described [6], ∑ in which the total number of tandem MS spectra matching peptides from protein k (SpC) was divided by the protein length (L), then divided by the sum of SpC/L for all uniquely identified proteins in the dataset.
LC-SRM-MS ─ The LC-SRM-MS analysis was performed with a TSQ Vantage triple quadrupole mass spectrometer equipped with nanospray source (Thermo Scientific, San Jose, CA). The online desalting and chromatography were performed using an Eksigent NanoLC-2D HPLC system (AB SCIEX, Dublin, CA). An aliquot of 10 μL of each tryptic digest was injected on a C18 peptide trap (Agilent, Santa Clara, CA), desalted with 0.1 % formic acid at a flow rate of 2 μL/min for 45 min. Peptides were eluted from the trap and separated on a reversed phase nano-HPLC column (PicoFrit™, 75 µm x 10 cm; tip ID 15 µm) packed in house using Zorbax SB-C18 (5-µm diameter particles, Agilent, Santa Clara, CA). Separations were performed using a flow rate of 500 nL/min with a 20-min linear gradient from 2-40% mobile phase B (0.1 % formic acid-90 % ACN) in mobile phase A (0.1 % formic acid), followed by 0.1-min gradient from 40-90% mobile phase B and 5-min 90% mobile phase B. The TSQ Vantage was operated in high-resolution SRM mode with Q1 and Q3 set to 0.2 and 0.7-Da Full Width Half Maximum. All acquisition methods used the following parameters: 1800 V ion spray voltage, a 275 C ion transferring tube temperature, a collision-activated dissociation pressure at 1.5 mTorr, and the S-lens voltage used the values in Slens table generated during MS calibration [4].

Identification of RelA target genes by ChIP-seq
We prepared the list of candidate target genes of RelA based on the result of our previous ChIP-seq experiment (32). To identify DNA sequences binding to RelA, covalent cross-linking, segmentation of DNA and anti-RelA antibody precipitation was performed in A549 cells. ChIPseq data of RelA were obtained by Illumina sequencing, resulting in two fastq files, one for TNF treatment at 0 minutes (TNF-0) and one for the TNF treatment at 30 minutes (TNF-30). As reported in (32), our in-house Instant Sequencing software was used to trim the reads at the location where one base was called with confidence interval greater than 99% (corresponding to phred score >= 20, p-value 0.01) (33) and only reads that were at least 35 bp long were preserved in the corresponding fasta file. The TNF-0 and TNF-30 samples contained 17,862,626 and 12,624,879 quality filtered reads, respectively. We used bowtie v0.12 software (34) to map the filtered reads to the human genome assembly GRCh37/hg19. We rejected reads that had mismatches or were non-uniquely mappable, leaving 77% of reads in the TNF-0 sample and 81% for the TNF-30 treated sample. ChIP-Seq binding peaks were called using MACS v0.14 (35), with parameters for calling peaks in human data, with TNF-0 sample as the control and TNF-30 sample as the treatment. In total, 20,733 peaks were called.
After parsing by Instant Sequencing, we selected the best 4195 peaks with a MACS significance score of at least 200. For gene assignment, Instant Sequencing was then used to find transcription start sites of known human genes proximal to ChIP-Seq peaks, using annotations from the GENCODE v12 database (36,37). Peaks whose summits were within a distance of 3 Kb upstream or downstream of a known transcription start site of a gene were identified. For each ChIP-Seq peak, a list of transcription start sites was ranked by the increasing order of their distance from the summit of a ChIP-Seq peak. Next, the gene with its transcription start site closest to the summit of a peak was found and reported, along with the MACS significance score of that peak. When multiple peaks were proximal to the same gene, the MACS significance scores for each peak were added together and the sum was reported as a single score for that gene. Finally a list of proximal genes ordered by their combined significance scores (Supplementary Data File S2, Sheet 1) was generated for finding enriched GO functions.

Enrichment analysis of predicted TGs composing the predicted triplets
Enrichment of the predicted TGs in newly identified and existing TGs was assessed by hypergeometric cumulative distribution function ( ), using the hygecdf function in Matlab. The function computes the hypergeometric cdf given the size of the population, the number of items with the desired characteristic and the number of samples. The final p-value is calculated by .

Experimental validation on STAT1-RelA-Target triplets
An interaction between STAT1 and RelA has been previously implicated by co-localization experiments based on confocal fluorescence microscopy (38). To verify the interaction between RelA and STAT1, we used immunoprecipitation and quantitative SID-SRM-MS to measure the enrichment of STAT1 in the sample obtained from the immunoprecipitation of anti-RelA antibody. The nuclear extract of A549 cells were incubated with anti-RelA antibody and nonspecific IgG (negative control), respectively as depicted in the Methods. The proteins associated with the antibodies were pulled-down and subjected to LC-SRM-MS to quantify the level of STAT1 in each sample.
Targets of the STAT1-RelA interaction were validated by STAT1 knockout and poly(IC) stimulation. Specifically, the real action modes of the predicted triplets of STAT1-RelA-TGs were validated by comparing the expression level of a target gene candidate in four conditions, consisting of STAT1 wild type human 2f (STAT1+/+) cell line and knockout U3A (STAT1-/-) combined with the absence or presence of poly(IC) stimulation in human fibrosarcoma cells. The experimental action modes were then compared with the predicted action modes.

Implementation of the probabilistic model for (M, TF, TG) triplet prediction
We implemented an algorithm for predicting (M, TF, TG) triplets based on a compendium of 2,158 expression profiles. The underlying probabilistic model (19) of triplet action depends on four basic parameters and , that correspond to the basal level of a TG, the dependence of TG expression on TF, the dependence on the M, and the interactive effect of the M and TF on the TG. For accurately predicting (M, TF, TG) triplets, two partial effects, and one total effect of the M and TF on the TG, were also estimated respectively by the parameters , , and (Methods). The existence of a functional (M, TF, TG) triplet is inferred if both (the estimate of ) and ̂ (the estimate of ̂ ) are significantly different from zero. The corresponding p-values were estimated as described in the Methods section.
According to the estimated values of the parameters, , , for the given triplet, that correspond to effect of TF on TG and the interactive effect of M and TF, the inferred triplets were categorized into one of six action modes: inhibition attenuation, inhibition enhancement, inhibition inversion, activation inversion, activation enhancement, and activation attenuation. Approximately 2% of the triplets did not have an assigned category, because the estimated parameters did not pass the adopted significance thresholds. The unclassified triplets were not considered in further analysis. Among the modulators of NF-κB/RelA, 36% have prevalent action modes of their triplets, while other modulators act in many or all action modes.
We formalized the distinction between specific and general modulators using the entropy of the distribution of the action modes as a measure of specificity. The entropy is compared to the entropy expected of a non-specific modulator with the same number of affected targets (Methods).
In three of the six action modes (inhibition attenuation, inhibition inversion, and activation enhancement), the parameter is significantly positive, and the TG is predominantly upregulated by the modulator M; that is M is a dominant activator. Conversely, for inhibition enhancement, activation inversion and activation attenuation is significantly negative and the modulator is a dominant inhibitor.
Moreover, a modulator is classified as an agonist of NF-κB/RelA if it prevalently regulates the TGs in the same direction as NF-κB/RelA (upup-or down-down-regulation), i.e., with inhibition enhancement and activation enhancement modes. Conversely, if M regulates its involving TGs prevalently in the opposite direction to NF-κB/RelA (up-down-or down-up regulation) it is considered an antagonist. Note that a dominant activator can be an agonist or antagonist of NF-κB/RelA, and likewise an inhibitor. For all four cases (activator, inhibitor, agonist, antagonist), we used the hypergeometric test to assess the significance of the dominant category.

Input data for inferring NF-κB/RelA modulatory network
Our goal was to characterize the dependencies between the modulators and corresponding target genes of NF-κB/RelA. We expected that the most significant (M, TF, TG) triplets would contain modulators and TGs with prior association with ,, NF-κB/RelA. Therefore, we focused on triplets in which the candidate modulator was independently indicated as an NF-κB/RelA interactor, and the candidate target was a target gene of NF-κB/RelA. Using such restricted sets of candidate Ms and TGs allowed minimizing the false discovery rate in inferred triplets. Before the triplet prediction and the following modulatory network inference, NF-κB/RelA modulators (NF-κB/RelA binding proteins) and the TG candidates were first collected as follows.

Candidate modulators
We postulated that the modulators of NF-κB/RelA would be enriched in proteins physically interacting with the NF-κB/RelA protein.  (40), and Boston University website (http://www.bu.edu/nf-kb/). We thus collected 297 proteins that have been previously reported to interact with NF-κB/RelA (Supplementary Data File S1, sheet 1). We refer to these 297 proteins as "previously known binding proteins" of NF-κB/RelA. There are 31 proteins present in both our LC-MS/MS data and the PIANA dataset.
The functional categories enriched in the newly identified NF-κB/RelA associated proteins are shown in Table II The combined set of LC-MS/MS data and exiting RelA associated proteins contains 632 unique proteins. Within this group, the mRNA expression profiles are available for 620 of the proteins, and these were selected for further analysis (Supplementary Data File S1, sheet 5). Several factors may be responsible for the relatively small overlap (10%) between the literature-derived set and our LC-MS/MS experiments. Our experimental identification of NF-κB/RelA interactors was restricted to nuclear extract, and the results may have been affected by different experimental techniques, interaction sampling variety and indirect binders of NF-κB/RelA identified by LC/MS spectrometry. However, as discussed above, the newly discovered set of RelA interactors exhibit properties characteristic of those previously known, which expands the set of experimentally verified NF-κB/RelA interactors by more than 2.5 times. We are confident that our work will be a very valuable contribution to elucidating NF-κB/RelA regulatory and modulatory networks.

Candidate target genes
The set of target genes used for triplet prediction has been compiled from two sources: our NF-

Triplets predicted using the candidate NF-κB/RelA modulators and TGs of NF-κB/RelA
Using the described above sets of candidate NF-κB/RelA modulators and TGs of NF-κB/RelA, we inferred the 8349 (M, NF-κB/RelA, TG) regulatory triplets of the six action modes (Supplementary Data File S4). At current threshold, for the triplets that satisfy the three hypotheses H1, H2 and H3, there are two percent (196) outliers among the predicted triplets that do not belong to any of the six action modes. In these cases, the α and α β are not significantly different from zero, while β , γ and β are significantly different from zero. The modulators have main and interactive effects at the present of the transcription factor, while the total effect is zero. Confounding factors may explain these cases. We excluded these type of modulators from all analyses and only considered the triplets with one of the six action modes. There is at least one predicted TG for 562 of the 620 binding proteins.
We closely examined these top 10 NF-κB/RelA modulators and found that there are literature reports in support of each of them as bona fide NF-κB/RelA modulators. Moreover, the literature generally supports the prevalent action modes identified here.
Specifically, GAPDH participates in NF-κB signaling (43,44). Recently, it has been reported that the mechanism is related to GAPDH"s role in TNF-induced NF-κB activation (45). This function may be disrupted during pathogen infection; as an important virulence strategy employed by attaching/effacing pathogens to inhibit the host NF-κB-dependent innate immune responses. Consistently, the predicted triplets demonstrate that as a modulator, GAPDH predominantly activates NF-κB TGs. Specifically, 155 triplets are of activation enhancement mode and 101 are inhibition inversion.
ESR1 and NF-κB have been found to repress each other in a cell-type specific manner in several contexts (46,47). Among the predicted 211 ESR1-RELA-TGs triplets, 91 are activation attenuation (P = 1.1e-10), consistent with the observed repression of NF-κB activity by ESR1.
In a yeast Alzheimer"s model induced by secretory amyloid-β (Aβ), a PICALM ortholog is involved in cellular toxicity by interacting with mitochondrion. Activation of the NF-κB pathway has been linked to Aβ neurotoxicity (48,49) in both neuronal cells and microglial cells. Consistently we predicted that the PICALM ortholog is a modulator of NF-κB with prevalent activation enhancement.
ACTG1 is one of the three isoforms of actin, and MACF1 is a microtubule and actin connecting factor involved in cytoskeleton dynamics homeostasis. Perturbation of cytoskeleton dynamics activates NF-κB (50,51), which can be mediated by IkappaB degradation, p38 MAP kinase activation (51) or NADPH-oxidase dependent pathways (51).
RDX, encodes Radixin, a component of the ezrin/radixin/moesin (ERM) complex that acts as a linker between the actin cytoskeleton and plasma membrane proteins, and signal transducers involving cytoskeletal remodeling (52). In colon cancer cells, ERM is essential for cell-neural adhesion molecule (L1CALM) mediated metastasis through activation of NF-κB (53). In THP-1 monocytes treated with TNF-α, ERM rapidly translocates the RhoA small GTPase to the plasma membrane and induces actin polymerization through NF-κB (54). The predicted triplets of RDX demonstrate it is an agonist of NF-κB transcriptional activity, where 38 are inhibition enhancements and 31 activation enhancements (P=1.9e-6).
TCF4 encodes the E2-2 protein, a member of the E2 box protein family, an essential and specific regulator of plasmacytoid dendritic cell (pDC) development (55). One of the E2-2 associated proteins, Spi-B, regulates pDC survival through direct induction of the anti-apoptotic gene BCL2A1 (56), which is also an important target gene of NF-κB. Consistently, TCF4 predominantly activates NF-κB in 51 of 106 predicted triplets with activation enhancement (P=3.1e-8).
HNRNPC encodes the heterogeneous nuclear ribonucleoproteins C1/C2 (hnRNP C1/C2), which mainly mediates nuclear mRNA retention in 40S hnRNP particles by competing with exporting proteins as an mRNA length ruler (57). Direct modulation of hnRNPC1/C2 is not reported yet. There are however reports on modulation of NF-κB by alternative hnRNPs: hnRNPA1 enhances NF-κB activity through the mRNA stability of the modulator cIAP1 of NF-κB in HeLa cells (58). hnRNPU stabilized the mRNA of the TGs regulated by NF-κB in the TLR signaling pathway in macrophages (59). hnRNPU/hnRNPA1 is also involved in the stress response as components of a stress sensor complex B23/hnRNPU/hnRNPA1(60). hnRNPA1 attenuates NF-κB activity through demobilizing the mRNA level of cIAP1, while hnRNPU, in our prediction, prevalently attenuates the NF-κB TGs, where 65 out of 76 triplets are of activation attenuation (P=0).
RPL38 was reported to be predominantly expressed in pancreatic ductal epithelium (61) and to perform transcript-specific translational control of HOX mRNA translation in mouse tissue patterning (62). No direct proof exists that RPL38 directly regulates NF-κB TGs. However, RPLS3 protein has been found to selectively regulate NF-κB TGs (63). Also, other ribosomal subunits have been previously implicated in selectively regulating NF-κB TGs (63).
Direct binding of ATP5C1 and RelA has not been reported. Nevertheless, there is indirect evidence of an interaction including that both ATP5C1 and RelA were reported to interact with MED15/ARC105, mediator complex subunit 15 (a RNA polymerase II transcription cofactor activity involved in transcription initiation). These interactions were detected by yeast two hybrid and pull down respectively in (64) and (65). In a recent study of the transcriptional response to relaxation training, ATP5C1 was one of the top upregulated critical molecules, and served as a hub in the interactive network of the relaxation response affected pathway. At the same time, NF-κB TGs were the top down-regulated hubs (66). Consistently, 48 of 65 ATP5C1-RelA-TG triplets showed ATP5C1 to be a repressor, which included 15 inhibition enhancement, 9 activation inversion, 24 activation enhancement (P=6.6e-8) (Figure 4).
Thus, for most of the predicted modulators, the functional association with NF-κB/RelA is supported by literature. These modulators may constitute key links in the cross-talk pathways modulating RelA/NF-κB activity in virus and bacteria infection, stress, immunity and cancer. The pathways include mitochondrion (ATP5C1), estrogen receptor pathway, actin cytoskeleton organization (ACTG1, MACF1 and RDX), clathrin mediated endocytosis (PICALM), and hnRNP C1/C2 mediated mRNA nuclear retention and genomic integrity.

The NF-κB/RelA modulatory network and its functional characterization
Many of the above triplets may be intertwined in the same pathway. To unravel the organization of transcription regulated by NF-κB/RelA and its modulators, we constructed a modulatory network, in which one type of nodes were modulators, and the other were TGs of RelA. This modulator-target interaction network can be represented as a matrix, whose rows are modulators, columns are TGs. The matrix elements correspond to the edges in the modulatory network, represented by the main effect of a modulator on a specific TG, . In this way, the association between any pair of modulators can be evaluated by their similar effect on all the TGs, and conversely, the association between two TGs can be assessed by their similar modulation by all the modulators. Subsequently, we used hierarchical biclustering to uncover the global organization of the modulatory network. The hierarchical approach to clustering was chosen because we expect that modulator-target interactions would reflect biological functions related to the NF-κB transcriptional activity, and such functions are known to form a nested structure. Such multi-level structure cannot be reflected by flat clustering methods, such as kmeans, and most affinity propagation algorithms. Also, model-based clustering did not apply in this case, because no a-priori model of the complex and diverse interaction network is known.
Specifically, the rows and columns of this matrix were clustered using the agglomerative clustergram function from Matlab with default parameters. The M-TG network was parsed into a hierarchy of groups according to the similarity of the value of . The hierarchical clustering dendrograms and the heatmaps for the matrix ( Figure 5) show obvious large modules. As shown in Figure 5, modulators contain three dense clusters, A, B and C. Also the TG show three main clusters (I, II and III in Fig. 4). The lists of genes in these clusters are presented in Supplementary Data File S5. Clustering according to did not produce modules as obvious as -and using both and did not improve the results (data not shown). Therefore we based further analysis on clustering according to . The module composition and the significant triplets in each module are provided in Supplementary Data File S5. We found that modulators in the PCNMs tended to up-regulate the TGs in the same module, while the Modulators in the NCNMs down-regulated the TGs in the same module (Selected modules are presented in Figure 6). To identify the function associated with each module, we analyzed the enrichment of GO terms within clusters found in the network of all binding proteins and all RelA TGs. First we considered the network modules that show dominant positive correlations between the modulators from cluster A, B and C and TGs from cluster I, II and III (A-II, B-II, C-I and C-III in Figure 5), termed Positive Correlation Network Modules (PCNMs). For each PCNM, we find GO terms enriched in both the set of modulators and TGs, which demonstrates the functionality of the module (Table III). The functional annotations were characterized using BiNGO software (26) in Cytoscape (27).
As shown in Figure 5 and The constructed modulation network also contains four main negatively correlated network modules (NCNMs): C-II, A-I, B-I and B-III, and were also analyzed, where the modulator clusters are respectively negatively correlated with the TG clusters (Table III and Supplementary Data File  S6). Among the overlapping GO categories that are the top significant categories for the TGs for C-II are regulation of cell death (rank 52), cellular response to stimulus, regulation of cell cycles, multi-organism process, regulation of protein metabolic process, positive regulation of apoptosis. Besides, C-II is also enriched in other processes including immune system development, cellular response to chemical stimulus, hemopoiesis, regulation of protein metabolic process, regulation of cell cycle, multi-organism process, cell cycle arrest, placenta development, initiation of viral infection, maintenance of location, and phosphorylation.
As seen in Figure 5 and Table III, a finer hierarchy of submodules can be observed if deeper levels of clustering are considered for the modulation matrix. The functional annotations of these sub-modules do not differ significantly from their parent modules. Specifically, module C-II splits into C-II-1 and C-II-2 and further into C-II-1a and C-II-1b. The enriched GO categories for C-II-1 and C-II-2 are very similar and also consistent with the sub-sub-modules. For C-II-1a, the enriched GO categories are regulation of genespecific transcription, positive regulation of molecular function and protein modification process, and cellular response to stimulus. For C-II-1b, the enriched GO categories are regulation of apoptosis, multi-organism process, positive regulation of apoptosis, regulation of cell cycle, regulation of cellular protein metabolic process Functional analysis shows that A-I is most enriched in transcription regulation, B-I in response to wounding and fat cell differentiation, while B-III in negative regulation of gene expression. Representative overlapping GO terms of A-I, B-I and B-III can be found in Supplementary Data File S6 and Figure 5. Our analysis suggests that the functional organization of the modulatory network may reflect the mechanisms of the cooperation between different processes.

Validation of the prediction using triplets predicted from the unconstrained TG candidates.
To validate the prediction method, we tested whether the predicted triplets were enriched in target genes obtained independently from the prediction. To this end, we applied the model allowing all genes as candidate targets of modulated regulation by NF-κB/RelA. Next, we selected all inferred target genes that had significant  andˆm  in at least one predicted triplet. Using p-value thresholds between 1e-6 (for low false positive rate) and 1e-3 (for low false negative rate), we obtained between 4785 and 19115 candidate genes dependent on the modulated regulation by NF-κB/RelA. Over the entire range, the predicted TGs are significantly enriched in the 830 targets of RelA identified by Chip-seq (Table IV). Also the enrichments in the literature-derived and combined target lists are significant (see Table V-VI).
Experimental validation of the STAT1-RelA-TGs triplets from the unconstrained TG candidates STAT1 (signal transducers and activators of transcription) is an important modulator of RelA as a pro-inflammatory effector. Acetylated STAT1 will repress the expression of anti-apoptosis NF-κB target genes via binding RelA and nuclear export (38) as revealed by in situ immunofluorescence analysis. To verify the interaction between RelA and STAT1 in the nucleus, we used immunoprecipitation and quantitative stable isotope dilution (SID)-selected monitoring (SRM)-mass spectrometry MS (68) in the sample obtained from the immunoprecipitation by anti-RelA antibody and nonspecific IgG (negative control) from the nuclear extract of A549 cells. The proteins associated with the antibodies were captured on protein-A beads and subjected to LC-SRM-MS to quantify the level of STAT1 in each sample. Compared to the negative control sample, STAT1 was enriched approximately 3.2 fold in the sample obtained from the immunoprecipitation of the anti-RelA antibody, indicating that STAT1 was directly associated with RelA. Thus, the interaction between STAT1 and RelA is confirmed by our SRM experiment (Figure 7) As a modulator of NF-κB/RelA-regulated expression, Stat1 is also predicted to modulate 2450 genes at a threshold of p-value of and pvalue of smaller than 1e-2 in this study. We found that STAT1 modulates RelA-dependent gene transcription with multiple action modes depending on the TGs. To validate the predicted Stat1-RelA-TG triplets and their corresponding action modes, we selected 12 of the predicted TGs covering four action modes and p-values of  and ˆm  in the range from zero to 1e-2 (See Supplementary Data File S7). We predicted 31 triplets. The corresponding estimated false discovery rate is 0.19 at the thresholds. We conducted qPCR experiments measuring mRNA expression of the 12 TGs, in STAT1-deleted (U3A, STAT1-/-) and STAT-1 wild-type cells (2f) stimulated with poly(I:C) as described in Methods. We performed three independent quantitative PCR experiments for each triplet. 32% of the predicted triplets (10 out of 31 triplets as shown in Supplementary Data File S7) were validated by experiment (P= 0.016 with binomial distribution test). The validated 10 triplets involve six TGs: EGFR kinase substrate 8 (EPS8), Integrin, Alpha E (ITGA6), Short Stature Homeobox 2 (SHOX2), and CCCH-Type Containing 7B (ZC3H7B), Neurotrophin 4 (NTF4) and O-Sialoglycoprotein Endopeptidase-Like (OSGEPL1). Considering the estimated fold discovery ratio at the model thresholds is 0.19, our prediction approach is effective. Since each gene has several probesets, multiple triplets with different action modes may be predicted for each TG (Supplementary Data File S7). For each TG, only the most significant triplet was kept. In this way, four of the 12 TGs were validated by the experiment with the same action mode. As shown in Figure 8, four predicted TGs were fully confirmed along with the exact same action modes: EPS8, ITGA6, SHOX2, and ZC3H7B. Interestingly, the fully validated four TGs were among highest ranked according to their p-values of the probabilistic model parameterwith ranks 1, 2, 4, and 6 (the prevalence of highest ranking TGs is significant: P=0.049 with rank sum test). The experimental results here justify our lower p-value thresholds in triplet prediction from all the TGs and binding proteins of RelA.
Although STAT1 is both a binding protein and TG of RelA as newly identified by us, and thus involved in a feed-back pathway, our analysis validated the predicted triplet action modes of STAT1-RelA. Considering the high noise of the candidate targets, which actually are quantified by gene expression profiles from the genome-wide genes, our proposed computational model is effective in generating testable general triplet hypotheses. We used a combination of experimental and computational approaches to characterize the modulation of transcription regulated by NF-κB/RelA. We selected candidate modulators of NF-κB/RelA using Liquid Chromatography/Tandem Mass Spectrometry (LC-MS/MS), identifying 365 binding proteins including 334 not previously reported. The identified modulators are functionally enriched in categories previously identified as related to NF-κB/RelA activity, as well as novel ones, such as ATP synthesis coupled electron transport, DNA helicase activity and conformation change, Ku70:Ku80 complex, RNA transport, oxidation reduction and cytoskeleton organization. Among the most significant enriched GO functions of the newly identified binding proteins and also of the existing modulators are: RNA processing (alternative splicing), cell cycle, mitochondrion and ribosome biogenesis, which may be candidate alternative processes interdependent with the NF-κB/RelA pathways. We implemented a probabilistic model for modulatory network inference, which combines an earlier approach of GEM (Babur et al, 2010) with Shannon Entropy filtering. Our method is selective for modulators with well-defined prevalent action modes. Using this probabilistic model, we generated a network of 8349 M-TF-TG triplets. The results have been validated by enrichments of predicted TGs in NF-κB/RelA targets identified in a ChIP-Seq experiment and by functional analysis of the identified regulations.

DISCUSSION
We discovered modules by hierarchical clustering of the regulatory network built from all the binding proteins and TGs. Some modulators might affect the rate of transcription of genes of all or any of NF-kB/RelA complex proteins themselves. In this case, the corresponding modified modulators are target genes themselves.
We removed the feedback loop to avoid this type of modulators in our network analysis. Additional validation of the method comes from literaturebased analysis of the functions of the top predicted modulators.
One of the predicted RelA modulators is STAT1, a member of the Signal Transducers and Activators of Transcription (STAT) family of transcription factors, which modulate NF-κB mediated regulatory pathways in response to many cell stimuli and pathogens (38). Considering the important role of STAT1 in modulating RelA activity, we analyzed the predicted STAT1-RelA-TGs triplets and validated their action modes by STAT1 deletion and quantitative real-time PCR in human fibrosarcoma cells.
In conclusion, collecting all candidate binding proteins and the TGs of RelA (the main component of NF-κB complex), and predicting the M-TF-TG triplets resulted in an integrated functional interaction network, containing potential new mechanisms of NF-κB activity and specificity. Analysis of module organization of the modulation network may reveal mechanisms involved in the cooperation between different biological processes in immune responses, inflammation, cancer and others. Identification of new mechanisms of RelA-dependent gene expression, and related signaling pathways that cross talk with NF-κB, may provide guidance for the design of targeted therapeutics. The newly discovered 562 NF-κB/RelA modulators with distinct profiles of TGs may be potential drug targets. The modulatory network and its clusterings clarify mechanisms of achieving NF-κB/RelA specificity through modulators. Our computational and statistical approach can be readily applied to elucidate modulation of other transcription factors.

Figure 6 Action modes of the significant triplets in network modules A-I, A-II, B-I and B-III .
Rows represent modulators and columns are TGs. Small squares are color-coded according the action mode of the triplet composed by the corresponding modulator and the TG (color scheme is the same as in Fig. 2 and Fig. 3 of the main text: dark greeninhibition attenuation, light greeninhibition enhancement, yellowinhibition inversion, light orangeactivation inversion, dark orangeactivation enhancement, redactivation attenuation). The numeric value in each square is the p-value of ̂ . Only statistically significant triplets are provided.   Tables   Table I. Interpretation of the categories of modulation and the constraints that the categories satisfy Inhibition attenuation TF, alone, inhibits TG; M attenuates TF activity + -0/--/+ 0/-Inhibition enhancement modulated TF inhibits TG -0/----Inhibition inversion TF, alone, inhibits TG; M inverts TF activity + -+ + + Activation inversion TF, alone, activates TG; M inverts TF activity -+ ---Activation enhancement modulated TF activates TG + 0/+ + + + Activation attenuation TF, alone, activates TG; M attenuates TF activity -+ 0/+ +/-0/+ The "+" and "-" signs indicate significantly positive and negative values, respectively. "0/-" denotes not significantly different from zero or significantly negative values, "0/+", not significantly different from zero or significantly positive values, "+/-" and "-/+", non-zero values. Moreover, the classification is for triplets where the null hypotheses in Eq. (10) were also rejected.

C-I-1
Regulation of apoptosis, aging, cellular component organization and positive regulation of apoptosis.

C-I-2
Regulation of apoptosis, response to chemical stimulus.

C-II
Regulation of cell death, cellular response to stimulus, regulation of cell cycle, multiorganism process, regulation of protein metabolic process, positive regulation of apoptosis,.

C-II-1
Regulation of apoptosis, positive regulation of gene expression, protein modification process, and response to organic substance.

C-II-1a
Regulation of gene-specific transcription, positive regulation of molecular function, positive regulation of protein modification process, and cellular response to stimulus