![]()
|
|
||||||||
J. Biol. Chem., Vol. 282, Issue 33, 23981-23995, August 17, 2007
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1
12
1
¶
3
From the
Department of Biology,
Bioinformatics Program, and ¶Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215
Received for publication, March 8, 2007 , and in revised form, May 31, 2007.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
100 primary response genes (1, 2). Induction of these genes does not require de novo protein synthesis and is therefore mediated by pre-existing transcription factors. Most of the characterized primary response genes (termed immediate-early genes) are maximally induced within 30 min of growth factor stimulation, although a few examples of primary response genes that are induced with slower kinetics have been described (3–8). Many of the well characterized primary response genes encode transcription factors, which regulate downstream secondary response genes as part of a larger transcriptional program (1, 2). Secondary response genes are induced later than immediate-early genes, and their induction is distinct from that of primary response genes in requiring de novo protein synthesis. Thus, the generally accepted model of growth factor-induced gene expression has two major components: the initial induction of primary response (immediate-early) genes, followed by a compulsory delay allowing translation of their mRNAs to produce the transcription factors that then induce the secondary response genes.
In this study, we employed global expression profiling to analyze the temporal program of transcriptional alterations induced by growth factor stimulation of human cells. As expected, we identified distinct patterns of rapid and delayed gene inductions. Surprisingly, however, we observed that a large fraction of delayed inductions did not require protein synthesis, and therefore represented delayed induction of primary response genes rather than induction of secondary response genes. These results suggested that the transcriptional program induced by growth factor stimulation involved not only the induction of immediate-early and secondary response genes but also the induction of a large group of delayed primary response genes that had previously been unrecognized.
The delayed primary response genes differed from immediate-early genes in both their functions and genomic architecture. Whereas many immediate-early genes encode transcription factors, transcriptional regulators were not prevalent among the delayed primary response genes. Rapid transcriptional induction of immediate-early genes was associated with several unique characteristics of these genes, including overrepresentation of shared transcription factor binding sites in upstream sequences of this gene set, high affinity TATA boxes in their core promoters, and short primary transcripts with few exons. In all of these features, delayed primary response genes more closely resembled other genes in the genome. These findings distinguish immediate-early from delayed primary response genes in terms of both function and transcriptional regulation, and suggest that immediate-early genes may have been selected for rapid induction based on their functions as transcriptional regulators. In contrast, the slower induction of both delayed primary and secondary response genes is consistent with their activities as effectors rather than mediators of growth factor signaling.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Oligonucleotide Array Spotting—Microarrays were fabricated by resuspending 21,329 70-mer oligonucleotides from the Human Genome Array-Ready Oligo Set version 2.0 (Operon) in 3x SSC to a final 30 µM concentration and spotted onto aminosilane-coated GAPS II slides (Corning Glass) with an OmniGrid accent microarrayer (GeneMachines). After drying, the slides were post-processed according to the oligonucleotide protocol provided by the manufacturer. Briefly, to promote spot uniformity, the microarrays were rehydrated with nuclease-free water and snap-dried on a 100 °C hot plate. The slides were UV cross-linked with 65 mJ of energy and shaken for 20 min in a blocking solution of 1-methyl-2-pyrrolidinone, 171 mM succinic anhydride, and 43 mM sodium borate. Finally, the slides were washed successively in water and 95% ethanol and then centrifuged at 800 rpm for 5 min to dry.
Microarray Sample Preparation, Hybridization, and Image Analysis—Starting with 100 ng of poly(A)+ RNA, one round of RNA amplification was performed with the MessageAmp aRNA amplification kit (Ambion) using a 4:1 amino-allyl UTP: UTP ratio for aRNA incorporation. For each sample, 8 µgof aRNA was coupled to N-hydroxysuccinimidyl esters of cyanine-3 or cyanine-5 (Amersham Biosciences). Following clean up, treated and untreated aRNA samples with opposing cyanine labels were combined, concentrated, and treated with a fragmentation reagent (Ambion) according to the manufacturer's protocol. For each slide, 4 µg of both treated and untreated cyanine-labeled aRNA samples were combined with a hybridization buffer (2.3x SSC, 18 mM HEPES, 0.2 mg/ml bovine serum albumin, 0.6 mg/ml poly(A), 0.2% SDS), heat-denatured for 3 min at 95 °C, and applied to microarrays under a LifterSlip coverslip (Erie Scientific). The slides were placed in a hybridization chamber (Dietech) and incubated in a 63 °C water bath for 16 h. Following hybridization, the slides were successively washed in 0.6x SSC with 0.025% SDS, 0.05x SSC, and water and then dried by centrifugation at 1000 rpm for 3 min. The microarrays were scanned with an Axon 4000B scanner and adaptive spot segmentation performed with GenePix Pro software (version 5.0) (Axon Instruments). For each treated sample, three independent replicate microarray experiments were performed.
Microarray Data Analysis—Triplicate dye-swap, background-subtracted median intensity values were used as input to the LIMMA analysis package (9) in Bioconductor (10), and average LOESS-corrected log2 ratios were used to estimate differential gene expression. For the PDGF-treated samples, genes with positive log2 ratios greater than or equal to 1 (2-fold) relative to untreated samples and FDR-corrected (11) moderated t test p values less than 0.01 were considered differentially expressed. Additional microarray dye-swap experiments were performed to identify genes with PDGF-stimulated inductions that were independent of new protein synthesis. RNA was extracted from cells treated with cycloheximide for 30 min followed by 2 or 4 h PDGF treatments. For each of three replicates, two microarray experiments were performed with different reference samples. The first compared cycloheximide and PDGF-treated samples to untreated samples, whereas the other compared cycloheximide and PDGF-treated samples to PDGF-treated samples.
Real Time RT-PCR for Microarray Validations and Heteronuclear RNA Measurements—Reverse transcription of 0.5 µg of total RNA was performed in 50 µl using SYBR Green RT-PCR reagents and random hexamer primers (Applied Biosystems) as recommended by the manufacturer. Following a 95 °C incubation for 10 min, 40 cycles of PCR (95 °C/15 s; 60 °C/1 min) were then performed on an ABI Prism 7900HT sequence detection system with 0.5 µl of the RT reaction, 100 nM PCR primers (supplemental Table 1), and SYBR Green PCR master mix in 5-µl reactions. Threshold cycles (CT) for three replicate reactions were determined using Sequence Detection System software (version 2.2.2), and relative transcript abundance was calculated following normalization with a glyceraldehyde-3-phosphate dehydrogenase PCR amplicon. Amplification of only a single species was verified by a dissociation curve for each reaction.
Chromatin Immunoprecipitation—Chromatin immunoprecipitations were performed as described previously (12), with modifications. Chromatin was immunoprecipitated overnight at 4 °C using 6.25 µg/ml anti-pol II antibody (N-20) (sc-899, Santa Cruz Biotechnology). Protein A-agarose beads were washed successively in low salt wash, high salt wash, LiCl wash, and twice in 1x TE. Immunoprecipitated chromatin was quantified with real time PCR using primers designed in proximity to the transcription start site as annotated in Entrez Gene (see supplemental Table 1 for primer sequences).
Gene Feature and Genomic Sequence Data—Unless otherwise noted, all gene and transcript annotations, including genomic positions of transcription initiation sites and exon/intron boundaries for 23,969 human RefSeq transcripts, were obtained from the Entrez Gene data base, corresponding to human genome build version 36.1 (13). Model transcripts (Ref-Seq accession numbers have "XM_" prefix) and transcripts mapped to alternate human contig assemblies (RefSeq accession numbers with "AC_" prefix) were not included in these analyses. For the core promoter analysis and splice site characterization, genomic sequence data were extracted from assembled RefSeq chromosome sequences for human genome build version 36.1.
Gene Ontology Analysis—Gene Ontology (GO) terms were obtained from the Entrez Gene data base for all human genes, and transitive closure of each term relationship was extracted from the daily GO build (14). Functional enrichment of coexpressed gene sets was determined with a one-tailed Fisher's exact test (15) by comparing the frequency of each term and all ancestors terms against the expected frequency from all annotated genes on the microarray. Only genes with at least one GO annotation were included in the analysis.
Transcription Factor Binding Site Analysis—Over-representation of transcription factor binding sites in the upstream regions of immediate-early and delayed primary response genes was analyzed as described previously (12, 16), using the program Tractor5 with 588 vertebrate transcription factor binding site matrices from TRANSFAC Professional (version 11.1) (17) and "minSUM" Match thresholds (18). For each matrix, the predicted site frequencies per gene for both the immediateearly and delayed primary response gene sets were compared with the site frequencies per gene observed in the upstream regions of 350 background genes using a permutation test. These background genes were randomly selected from genes expressed but not induced by PDGF on the microarrays. Two independent analyses were performed. The first used only human sequences for predictions, and the second considered only sites predicted within the same position of a human-dog-mouse multiple sequence alignment of each upstream region. Sequences and MULTIZ alignments were selected using the genome browser at the University of California, Santa Cruz (human, dog, and mouse versions hg18, mm8, canFam2, respectively) (19). For both analyses, the results were filtered to only include matrices that predicted, on average, less than 1 site per kb in background sequences and that detected sites upstream of at least 10% of the gene set being tested. p values were adjusted with a false-discovery rate correction (11). Only those matrices meeting the criteria of less than 1 hit per kb of upstream sequence in the background set and at least one hit in 10% of the test genes were considered in the correction.
Core Promoter Analysis—The promoters for immediateearly and delayed primary response genes were scanned using the Match algorithm with no score thresholds (18) and position-specific scoring matrices representing six core promoter elements (supplemental Fig. 1). The regions –48 to –21, –55 to –5, –13 to +15, +7 to +38, +17 to +43, and +89 to +177 relative to the transcription initiation site were scanned for the TFIIB recognition element (20), TATA box, initiator, motif 10 element (21), downstream core promoter element (22), and multiple start site element downstream (MED-1) (23), respectively. For each core promoter element, the highest scoring position within each window on the forward strand was recorded for each transcript. Because some genes encode multiple transcripts, the maximum scores among all transcripts were determined for each human gene in the Entrez Gene data base. Assessment of the biological significance of TATA prediction scores was performed as described previously (12). To identify at least 95% of sequences in three classes of TATA-binding sites defined previously (24), "TATAAA," "TAAATA," and "TATATA," a threshold of 0.7 was selected. This threshold was then used to identify the genome-wide frequency of TATA boxes predicted between –55 and –5 upstream of 23,969 human RefSeq transcripts.
Cap Analysis of Gene Expression Tag Classification—Human data for cap analysis of gene expression (CAGE) tag clusters and their attributes, and the classification of tag clusters (25), were combined into a single relational data base. For each RefSeq identifier, a representative transcription start site with maximal tag frequency and the associated class identifier were extracted. Carninci et al. (25) limited the cluster annotation to those with at least 100 tags; therefore, only 5,755 unique RefSeq transcripts were associated with a particular class. Classification information was available for 18 of 46 immediate-early genes, 21 of 50 delayed primary response genes, and 6 of 17 secondary response genes. To test for enrichment of each class in the primary response gene sets, one-sided Fisher's exact test p values were calculated.
| RESULTS |
|---|
|
|
|---|
|
The induction kinetics of several representative genes were analyzed using real time RT-PCR with a finer resolution time course following 0.5, 1, 2, 3, 4, 5, and 6 h of PDGF treatment (Fig. 2). Consistent with the current microarray results and previous studies (16, 29), two well characterized primary response genes, FOS and MCL1, exhibited rapid but transient inductions that peaked at 0.5 h following PDGF treatment (Fig. 2A). MMP3, a known secondary response gene, was not significantly induced until 3 h of PDGF treatment and was blocked by cycloheximide, confirming the array results (Fig. 2B). In contrast to FOS and MCL1, five delayed primary response genes, VCL, PLOD2, DKK1, SOD2, and CCND1, demonstrated slower inductions, reaching maximal mRNA levels between 1 and 4 h following PDGF stimulation (Fig. 2C). Consistent with the microarray results, induction of these genes was not blocked by cycloheximide, confirming their classification as delayed primary response genes. Additionally, to validate the array results, transcript levels of a total of 19 genes following 0.5, 2, and 4 h of PDGF treatment, in the presence and absence of cycloheximide, were independently tested using quantitative real time RT-PCR, the results of which confirmed the microarray data (supplemental Table 3).
To determine whether representative genes were induced with similar kinetics in response to mitogens other than PDGF, quiescent T98G cells were alternatively stimulated with EGF or serum in the presence or absence of cycloheximide (Fig. 3). Consistent with the results obtained with PDGF, FOS and MCL1 were induced as immediate-early genes, MMP3 as a secondary response gene, and VCL, PLOD2, DKK1, SOD2, and CCND1 as delayed primary response genes following both serum and EGF treatment.
|
|
B (NF-
B, represented by the V$CREL_01 and V$NFKAPPAB65_01 matrices), PAX-3, and early growth response (KROX) transcription factors, as significantly over-represented in upstream sequences of the immediate-early genes (Table 2, top; the entire list can be found in supplemental Table 4). In contrast, upstream regions of the set of delayed primary response genes lacked over-represented binding site matrices for these or other transcription factors.
|
|
B, cyclic AMP response element-binding protein (CREB), and AP-1 (activator protein-1), were significantly over-represented in the upstream regions of immediate-early genes. Notably, neither these nor other transcription factor binding site matrices were significantly over-represented in the set of upstream regions of the delayed response genes. Similar results were obtained when scanning 1-, 3-, or 5-kb upstream regions. The core promoter sequences were also examined for possible differences in binding sites for general transcription factors. The core promoter includes the TATA box, the TFIIB recognition element (20), the initiator, the motif 10 element (21), the downstream core promoter element, frequently found in TATA-less promoters (22), and the multiple start site element downstream (MED-1) (23) (Fig. 4A). Core promoter regions of immediate-early and delayed primary response genes were compared by analysis of the promoter sequences of each gene set near the expected positions for six core promoter elements (Fig. 4A). The highest scoring subsequences within these windows were determined using Match with frequency matrices representing each promoter element (supplemental Fig. 1). For each element, the distributions of scores for the immediateearly and delayed primary response genes were compared with one another and to a genome-wide score distribution with the Wilcoxon rank sum test (Fig. 4B).
The results indicate a significant difference in the TATA scores for the immediate-early genes (p = 5.1 x 10–7) relative to scores for 18,191 human genes in the Entrez Gene data base, whereas significant differences in scores for the other core promoter elements were not observed. Furthermore, immediateearly gene TATA scores were significantly higher than those for the delayed primary response genes (p = 2.0 x 10–3) (Fig. 4B). In contrast, TATA scores of the delayed primary response genes did not differ significantly from all genes in the Entrez Gene data base (TATA scores for all individual genes can be found in supplemental Table 5).
The distributions of TATA scores for the entire Entrez Gene data base, immediate-early genes, and delayed primary response genes are plotted in Fig. 4C. A threshold score of 0.7, which identifies more than 95% of sequences bound by human TATA-binding protein (hTBP) in vitro (24), was used to define a functional TATA box (12). When applied genome-wide, this threshold identified 22% of genes with at least one transcript containing a TATA box, a figure similar to other estimates of TATA box prevalence (30). Using this threshold, 27 of 46 (59%) immediate-early and 17 of 50 (34%) delayed primary response genes contained a TATA box. The TATA box prevalence in immediate-early genes differed significantly (p = 1.0 x 10–7 by one-sided Fisher's Exact Test) from the Entrez Gene data base.
|
RNA Polymerase II Occupancy at Promoters of Immediate-Early and Delayed Primary Response Genes—Identification of distinct transcription factor binding site enrichment and TATA box abundance upstream of the immediate-early genes suggests that the lag in delayed primary response gene expression may result from slower transcription initiation rates, which could be a consequence of RNA polymerase II (pol II) abundance and/or recruitment at target gene promoters. To investigate this further, pol II binding to promoters of immediate-early and delayed primary response genes was investigated by chromatin immunoprecipitation (ChIP) analysis.
Quiescent T98G cells were treated with PDGF for 0–4 h and subjected to pol II ChIP, using an antibody against the N terminus of pol II so that recognition of pol II was not affected by modifications of its C-terminal domain. pol II occupancy was examined at the transcription start sites of 11 immediate-early (Fig. 5A) and 19 delayed primary response genes (Fig. 5B). All of these genes had pol II occupancy above that observed at the nontranscribed
-globin gene (Fig. 5), as well as several other negative control genes (data not shown). For the majority of genes (73% of immediate-early genes and 68% of delayed primary response genes), pol II occupancy did not change upon PDGF stimulation, suggesting a post-polymerase recruitment mechanism may be responsible for gene induction. Preloaded pol II at transcription start sites is not unprecedented, as FOS and MYC are well established examples of genes with a paused polymerase in their proximal promoter regions in unstimulated cells (31).
For the three immediate-early genes that exhibited an increase (>1.75-fold) in pol II promoter occupancy upon PDGF treatment, all three had maximum pol II occupancy at 0.5 h of PDGF treatment, coincident with their mRNA inductions (Fig. 5A). For the delayed primary response genes that had increased pol II occupancy upon PDGF treatment, three of six (DKK1, DDX21, and ESDN) had peak pol II occupancy after 2 h of PDGF treatment (Fig. 5B), consistent with the possibility that delayed recruitment of pol II may play a part in their delayed mRNA inductions. However, the other three of these six delayed primary response genes (VCL, TGFB2, and EPHA2) demonstrated peak pol II occupancy after only 0.5 h of PDGF treatment, similar to what was observed for some immediate-early genes and much earlier than their mRNA inductions. This suggests that the delay in mRNA induction for these genes occurs after the recruitment of pol II.
Although pol II recruitment does not appear to be a major factor resulting in delayed mRNA inductions, there was a clear difference in pol II occupancy between the immediate-early and delayed primary response gene groups. For both untreated cells (Fig. 5C) and for the time point at which maximum pol II occupancy was observed (Fig. 5D), the immediate-early genes had significantly higher pol II occupancy than the delayed primary response genes (p = 0.026 for untreated cells and p = 0.0017 at the time of maximum pol II occupancy), possibly correlating with the differences in promoters and transcription start sites between these two groups of genes.
Analysis of hnRNA Transcription—Because
70% of the genes tested by ChIP did not demonstrate a change in pol II occupancy upon PDGF treatment, it is possible that transcriptional changes are not responsible for the observed mRNA inductions. To test this, hnRNA levels were measured with real time RT-PCR using 5'-biased intron-specific primers (32) for 23 delayed primary response genes following PDGF stimulation (supplemental Table 6). For all of the genes tested, hnRNA levels increased
2-fold and were similar to or greater than the corresponding mRNA inductions upon PDGF treatment. Thus, although pol II occupancy at these genes is not altered, their transcription is induced upon PDGF treatment.
|
Seven of the delayed primary response genes (30%) exhibited a clear delay in hnRNA synthesis, not reaching peak hnRNA levels until 1–3 h of PDGF treatment (Fig. 6A, panel 4). For these genes, the delay in transcription of hnRNA was consistent with their delayed mRNA expression (Fig. 2 and Fig. 6B). Two of these genes, DKK1 and ESDN, also have delayed pol II recruitment with peak occupancy at 2 h (see Fig. 5B).
In contrast, five delayed primary response genes (22%), including VCL, responded with transcriptional kinetics similar to immediate-early genes, with peak induction of hnRNA at 15 min of PDGF treatment (Fig. 6A, panel 2). These kinetics of VCL hnRNA synthesis are in agreement with the pol II ChIP results, which revealed maximum pol II occupancy at the VCL promoter following 30 min of PDGF treatment (Fig. 5B). The hnRNA levels of these genes increased much earlier than their mRNA levels (Fig. 2 and Fig. 6B), indicating that mechanisms following the initiation of transcription and the start of productive elongation are resulting in delayed mRNA induction.
A third group of 11 delayed primary response genes (48%) (Fig. 6A, panel 3) showed intermediate delays in transcription of their hnRNAs. These genes are less clearly defined in terms of kinetics of transcription, although it is likely that steps both preceding and following the start of productive elongation may play a role in their delayed mRNA inductions.
Primary Transcript Feature Analysis—The above results indicated that although delays in transcription initiation and/or the start of productive elongation contributed to the lag in expression of some delayed primary response genes, other differences in transcription or processing also contributed to the delay in mRNA formation. We therefore analyzed other features that might affect rates of transcription or mRNA processing, including primary transcript length and intron/exon structure.
Because processing of pre-mRNA is a potential rate-limiting step in gene expression, variations in 5' donor and 3' acceptor splice sites could indicate a general difference in splicing efficiency between the classes of primary response genes. We therefore compared the 5' and 3' splice site nucleotide compositions for the immediate-early and delayed primary response genes. However, there was no significant difference between the splice site characteristics of these groups of genes (supplemental Fig. 2). Next, the primary transcript length and exon frequency distributions for immediate-early and delayed primary response genes were compared with each other and to a distribution of all genes in the Entrez Gene data base (Fig. 7). The analysis indicated a significant difference in both the primary transcript length (p = 4.2 x 10–8) and exon frequency (p = 1.4 x 10–4) distributions of immediate-early genes relative to the genome-wide distributions (Fig. 7) (see supplemental Table 5 for individual genes). In contrast, no significant differences were noted when these features of delayed primary response genes were compared with the genome-wide distribution. Furthermore, the immediate-early primary transcripts were significantly shorter that those of the delayed primary response genes (on average,
19 kb versus
58 kb, respectively, p = 2.5 x 10–9) and contained significantly fewer exons (on average, 5.8 versus 10.4, respectively, p = 1.4 x 10–4). These results suggested that, in addition to other gene features, the observed lag in mature mRNA induction of some delayed primary response genes may be related to both primary transcript length and exon frequency.
| DISCUSSION |
|---|
|
|
|---|
37, 44, and 19% of the genes induced within 4 h of PDGF stimulation, respectively. Similar kinetics of induction of representative delayed primary response genes were observed in response to the alternative mitogens EGF and serum, suggesting that their induction kinetics are not PDGF-specific events. Examples of delayed primary response genes have been observed by others in primary human fibroblasts (3, 6, 29), rat arterial smooth muscle cells (4), and mouse 3T3 cells (5, 7, 8), but the large number of primary response genes we found to be induced with such delayed kinetics was unexpected, suggesting a more complex regulatory landscape in mammalian cells. Transcriptional programs are often represented as gene networks, where products of expressed genes activate or repress secondary downstream gene targets. Many analyses assume temporal regulation according to the canonical immediate-early/secondary response gene paradigm to infer protein-gene interactions from correlations in gene expression data (33). By highlighting the unexpectedly high incidence of delayed primary response genes, our results have broad implications for analyses that infer regulatory interactions from temporal correlations in gene expression. Because many genes that are induced with a significant lag after growth factor stimulation are still primary response genes, it cannot be assumed that temporally delayed gene expression requires the prior induction of upstream transcriptional regulators.
|
|
We also examined the basis for the distinct kinetics of induction of immediate-early and delayed primary response gene mRNAs. Analysis of hnRNA demonstrated that both immediate-early and delayed primary response genes were induced at the transcriptional level. The hnRNAs of immediate-early genes were rapidly induced, coincident with the rapid inductions of their mRNAs. The lag in induction of a number of delayed primary response mRNAs appeared to result from either a delay in transcription initiation or the start of productive elongation, as suggested by the delayed inductions of their hnRNAs. In contrast, hnRNAs of other delayed primary response genes were rapidly induced, suggesting that the lag in mRNA induction resulted from delays in subsequent stages of transcriptional elongation or processing. These differences between the kinetics of induction of immediate-early and delayed primary response gene mRNAs appear to be associated with a combination of factors, including the over-representation of upstream binding sites for shared transcription factors, core promoter elements, gene length, and exon frequency.
Computational comparisons revealed striking differences in the prevalence of predicted binding sites for shared transcription factors in the upstream regions of immediate-early and delayed primary response genes. Binding sites for several known regulators, including SRF, AP-1, CREB, KROX, and NF-
B, were over-represented in the upstream regions of immediate-early genes compared with other genes that were expressed in T98G cells but not induced by PDGF. In contrast, binding sites for either these or other transcription factors were not significantly over-represented upstream of the delayed primary response genes. The absence of predicted binding site enrichment upstream of the delayed primary response genes may indicate that, although immediate-early genes are activated by a shared set of transcription factors, the delayed primary response genes are controlled by a more diverse set of regulators, which would not be identified as over-represented in the gene set. Alternatively, it is possible that delayed primary response genes contain fewer clusters of transcription factor binding sites near their promoters than immediate-early genes, or that the transcription factor binding sites upstream of delayed primary response genes are lower affinity sites than those upstream of immediate-early genes, because lower affinity sites that are divergent from the binding site matrix might not be scored in the computational analysis. Both of these factors could reduce the affinity of transcription factor binding to the promoter regions of delayed primary response genes, correspondingly reducing their rates of transcriptional activation.
The core promoters of the immediate-early genes also differed from those of the delayed primary response genes. In particular, promoters of the immediate-early genes contained higher affinity TATA boxes than those of the delayed primary response genes. Similarly, the prevalence of TATA boxes in the promoters of immediate-early genes (59%) was significantly higher than in the promoters of delayed primary response genes (34%) or in all genes in the genome (22%). This may have important implications in transcription initiation, with higher affinity TATA boxes conferring greater transcriptional activity on the promoters of immediate-early genes. Reinforcing the notion that the immediate-early genes have stronger, more defined initiation is the demonstration that these genes also have a significant bias for the single peak promoter class defined by CAGE analysis (25). Moreover, because some components of the transcription initiation complex, including TBP, remain bound to DNA following pol II promoter clearance, the stability of these factors may modulate the transcription reinitiation rate. Thus, high scoring TATA boxes present in immediate-early promoters may represent higher affinity TBP-binding sites that confer rapid reinitiation (35). Indeed, previous work demonstrated instability of TBP-TATA interactions following the first round of transcription (36), and noncanonical TATA box sequences diminish binding of TFIIA (37), a general transcription factor that is thought to stabilize the TBP-TATA complex (38).
The differences in both upstream transcription factor binding sites and core promoters are also consistent with differences in the binding of RNA polymerase II to the promoter regions of immediate-early and delayed primary response genes. Chromatin immunoprecipitation indicated that pol II was bound to the promoters of both immediate-early and delayed primary response genes in unstimulated cells, and that pol II occupancy increased on the promoters of about one-third of the genes in both groups following growth factor stimulation. Thus, transcriptional induction of the majority of immediate-early and delayed primary response genes may result from the start of productive elongation by a paused polymerase, rather than by recruitment of pol II to the preinitiation complex. These findings are consistent with previous demonstrations of paused polymerases near the transcription start sites of immediate-early genes, including FOS and MYC (31), as well as with global analyses that have detected preinitiation complexes at the promoters of many nontranscribed genes in human cells (39). Importantly, however, the amount of pol II bound to the promoters of immediate-early genes was significantly greater than that bound to the promoters of delayed primary response genes. These differences in pol II occupancy highlight a key distinction between the immediate-early and delayed primary response gene promoters. Together with the differences in both upstream transcription factor binding sites and TATA boxes, these findings point to transcription initiation, and perhaps reinitiation, as one of the primary mechanisms for rapid responses of immediate-early genes to growth factor stimulation relative to the delayed primary response genes.
Our analysis also revealed significant differences between the immediate-early and delayed primary response genes in both primary transcript lengths and exon frequencies. The immediate-early genes tend to be shorter and contain fewer exons than the delayed primary response genes, which are similar in length and exon frequency to other genes in the genome. These transcript features may contribute significantly to the lag in mRNA expression of delayed primary response genes, particularly for those genes that displayed a rapid induction of transcription, as detected by hnRNA. VCL provides an extreme example of the possible effect of primary transcript length and exon frequency on kinetics of mRNA expression. Analysis of hnRNA established that transcription of VCL was rapidly initiated, similar to immediate-early genes, such as FOS and MCL-1. Consistent with its rapid transcriptional induction, SRF has been reported to be a key inducer of VCL (40). However, the accumulation of VCL mRNA was delayed by 2–3 h compared with the hnRNA. This lag in mature VCL mRNA production may be explained by the 122-kb primary transcript length, which is more than six times the average immediate-early gene primary transcript length, and the presence of 22 exons, which is almost four times the average number of immediate-early gene exons. At the other extreme, DKK (another delayed primary response gene) has a primary transcript of only 3.3 kb containing four exons, comparable with that of the shortest immediate-early genes. In contrast to VCL, transcriptional induction of DKK is delayed for 2–3 h after growth factor stimulation, coincident with increased pol II occupancy at its promoter. DKK may therefore represent an example of a gene whose delayed induction results primarily from a lag in pol II recruitment and transcription initiation.
Multiple differences between immediate-early and delayed primary response genes thus appear to contribute to the distinct kinetics of induction of their mRNAs. The immediateearly genes are characterized by over-representation of binding sites for several transcription factors in their upstream regions, promoters with high affinity TATA boxes, and short primary transcripts containing relatively few exons. In all of these respects, the delayed primary response genes are similar to other genes in the genome. Additional features, such as chromatin structure, may also distinguish immediate-early from delayed primary response genes, as has been reported for genes displaying rapid versus delayed inductions in response to other stimuli (41–43).
To determine whether these characteristics of immediateearly genes were consistent in other cell types, we analyzed the features of immediate-early genes induced by the mitogenic stimuli EGF in HeLa cells and serum in MCF10A cells (normal human breast epithelial cells) in published data sets (44). As in T98G cells, the immediate-early genes induced in both HeLa and MCF10A cells showed an over-representation of transcription factor binding sites, including sites for SRF, AP1, CREB, KROX and NF-
B, which were conserved in mouse and dog (supplemental Table 7; for complete Transfac output, see supplemental Table 8). Likewise, immediate-early genes in HeLa and MCF10A cells had significantly higher TATA scores, lower exon frequencies, and shorter transcript lengths as compared with the genome as a whole (supplemental Figs. 3 and 4). Thus, the immediate-early genes induced in T98G, HeLa, and MCF10A cells by three different mitogens share common characteristics of genomic organization.
The multiple features associated with rapid induction of immediate-early genes may have been selected for based on the functions of immediate-early gene products as transcriptional regulators that mediate subsequent alterations in gene expression in response to growth factor stimulation. The rapid induction of immediate-early genes might be expected to play an important role in achieving a robust cellular response to extracellular signals. In contrast, the lag in induction of both delayed primary and secondary response genes is consistent with the apparent functions of these genes as effectors rather than mediators of growth factor signaling. Thus, immediate-early genes are not only characterized by a lack of requirement for new protein synthesis prior to their transcriptional induction, they also possess distinct genomic features that may have been selected to confer rapid inducibility.
| FOOTNOTES |
|---|
The microarray gene expression data from this study has been submitted to GEO (Gene Expression Omnibus) under accession number GSE8315
[NCBI GEO]
. ![]()
The on-line version of this article (available at http://www.jbc.org) contains supplemental Tables 1–8 and Figs. 1–4. ![]()
1 These authors contributed equally to this work. ![]()
2 Present address: Pfizer Inc., Research Technology Center, 620 Memorial Dr., Cambridge, MA 02139. ![]()
3 To whom correspondence should be addressed: Dept. of Biology, Boston University, 5 Cummington St., Boston, MA 02215. Tel.: 617-353-8735; Fax: 617-353-8484; E-mail: gmcooper{at}bu.edu.
4 The abbreviations used are: PDGF, platelet-derived growth factor; EGF, epidermal growth factor; RT, reverse transcription; hnRNA, heteronuclear RNA; IEG, immediate-early gene; D-PRG, delayed primary response gene; SRF, serum-response factor; CREB, cyclic AMP-response element-binding protein; MED-1, multiple start site element downstream; CAGE, cap analysis of gene expression; pol II, RNA polymerase II; TBP, TATA-binding protein; ChIP, chromatin immunoprecipitation; h, human; GO, Gene Ontology. ![]()
5 M. E. Schaffer, G. M. Cooper, S. Kasif, manuscript in preparation. ![]()
| ACKNOWLEDGMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|