Immediate-Early and Delayed Primary Response Genes Are Distinct in Function and Genomic Architecture*

The transcriptional program induced by growth factor stimulation is classically described in two stages as follows: the rapid protein synthesis-independent induction of immediate-early genes, followed by the subsequent protein synthesis-dependent induction of secondary response genes. In this study, we obtained a comprehensive view of this transcriptional program. As expected, we identified both rapid and delayed gene inductions. Surprisingly, however, a large fraction of genes induced with delayed kinetics did not require protein synthesis and therefore represented delayed primary rather than secondary response genes. Of 133 genes induced within 4 h of growth factor stimulation, 49 (37%) were immediateearly genes, 58 (44%) were delayed primary response genes, and 26 (19%) were secondary response genes. Comparison of immediateearly and delayed primary response genes revealed functional and regulatory differences. Whereas many immediate-early genes encoded transcription factors, transcriptional regulators were not prevalent among the delayed primary response genes. The lag in induction of delayed primary response compared with immediateearly mRNAs was because of delays in both transcription initiation and subsequent stages of elongation and processing. Consistent with increased abundance of RNA polymerase II at their promoters, immediate-early genes were characterized by over-representation of transcription factor binding sites and high affinity TATA boxes. Immediate-early genes also had short primary transcripts with few exons, whereas delayed primary response genes more closely resembled other genes in the genome. These findings suggest that genomic features of immediate-early genes, in contrast to the delayed primary response genes, are selected for rapid induction, consistent with their regulatory functions.

The transcriptional program induced by growth factor stimulation is classically described in two stages as follows: the rapid protein synthesis-independent induction of immediate-early genes, followed by the subsequent protein synthesis-dependent induction of secondary response genes. In this study, we obtained a comprehensive view of this transcriptional program. As expected, we identified both rapid and delayed gene inductions. Surprisingly, however, a large fraction of genes induced with delayed kinetics did not require protein synthesis and therefore represented delayed primary rather than secondary response genes. Of 133 genes induced within 4 h of growth factor stimulation, 49 (37%) were immediateearly genes, 58 (44%) were delayed primary response genes, and 26 (19%) were secondary response genes. Comparison of immediateearly and delayed primary response genes revealed functional and regulatory differences. Whereas many immediate-early genes encoded transcription factors, transcriptional regulators were not prevalent among the delayed primary response genes. The lag in induction of delayed primary response compared with immediateearly mRNAs was because of delays in both transcription initiation and subsequent stages of elongation and processing. Consistent with increased abundance of RNA polymerase II at their promoters, immediate-early genes were characterized by over-representation of transcription factor binding sites and high affinity TATA boxes. Immediate-early genes also had short primary transcripts with few exons, whereas delayed primary response genes more closely resembled other genes in the genome. These findings suggest that genomic features of immediate-early genes, in contrast to the delayed primary response genes, are selected for rapid induction, consistent with their regulatory functions.
The binding of growth factors to cell surface receptors leads to the activation of signaling pathways that ultimately control cell proliferation, differentiation, and survival. The critical targets of these signaling cascades include transcription factors, and many of the changes in cell behavior resulting from growth factor stimulation are because of altered programs of gene expression. The canonical model of a highly ordered program of gene expression induced by growth factor stimulation is the coordinate regulation of primary and secondary response genes. The initial transcriptional response to growth factor stimulation is the induction of ϳ100 primary response genes (1,2). Induction of these genes does not require de novo protein synthesis and is therefore mediated by pre-existing transcription factors. Most of the characterized primary response genes (termed immediate-early genes) are maximally induced within 30 min of growth factor stimulation, although a few examples of primary response genes that are induced with slower kinetics have been described (3)(4)(5)(6)(7)(8).
Many of the well characterized primary response genes encode transcription factors, which regulate downstream secondary response genes as part of a larger transcriptional program (1,2). Secondary response genes are induced later than immediate-early genes, and their induction is distinct from that of primary response genes in requiring de novo protein synthesis. Thus, the generally accepted model of growth factor-induced gene expression has two major components: the initial induction of primary response (immediate-early) genes, followed by a compulsory delay allowing translation of their mRNAs to produce the transcription factors that then induce the secondary response genes.
In this study, we employed global expression profiling to analyze the temporal program of transcriptional alterations induced by growth factor stimulation of human cells. As expected, we identified distinct patterns of rapid and delayed gene inductions. Surprisingly, however, we observed that a large fraction of delayed inductions did not require protein synthesis, and therefore represented delayed induction of primary response genes rather than induction of secondary response genes. These results suggested that the transcriptional program induced by growth factor stimulation involved not only the induction of immediate-early and secondary response genes but also the induction of a large group of delayed primary response genes that had previously been unrecognized.
The delayed primary response genes differed from immediate-early genes in both their functions and genomic architecture. Whereas many immediate-early genes encode transcription factors, transcriptional regulators were not prevalent among the delayed primary response genes. Rapid transcriptional induction of immediate-early genes was associated with several unique characteristics of these genes, including overrepresentation of shared transcription factor binding sites in upstream sequences of this gene set, high affinity TATA boxes in their core promoters, and short primary transcripts with few exons. In all of these features, delayed primary response genes more closely resembled other genes in the genome. These findings distinguish immediate-early from delayed primary response genes in terms of both function and transcriptional regulation, and suggest that immediate-early genes may have been selected for rapid induction based on their functions as transcriptional regulators. In contrast, the slower induction of both delayed primary and secondary response genes is consistent with their activities as effectors rather than mediators of growth factor signaling.

EXPERIMENTAL PROCEDURES
Cell Culture and RNA Extraction-T98G human glioblastoma cells were grown in minimal essential medium (Invitrogen) supplemented with fetal calf serum (10%). Cells were rendered quiescent by incubation in serum-free medium for 72 h and either left unstimulated or stimulated for the indicated times with human platelet-derived growth factor (PDGF) 4 -BB (50 ng/ml) (Sigma), epidermal growth factor (EGF) (30 ng/ml) (Calbiochem), or 20% fetal calf serum. When called for, cycloheximide (10 g/ml, a concentration that inhibits protein synthesis Ͼ90% in T98G cells) was added 30 min prior to PDGF addition. Total RNA for real time reverse transcription (RT)-PCR for microarray validations and heteronuclear RNA (hnRNA) analysis was extracted with TRIzol reagent (Invitrogen). Following ethanol precipitation, total RNA was applied to an RNeasy column (Qiagen) for further purification and treated with DNase according to the manufacturer's protocols. RNA for microarray experiments was extracted with TRIzol reagent (Invitrogen) followed by poly(A) ϩ RNA isolation with an Oligotex mRNA midi kit (Qiagen) according to each manufacturer's protocol.
Oligonucleotide Array Spotting-Microarrays were fabricated by resuspending 21,329 70-mer oligonucleotides from the Human Genome Array-Ready Oligo Set version 2.0 (Operon) in 3ϫ SSC to a final 30 M concentration and spotted onto aminosilane-coated GAPS II slides (Corning Glass) with an OmniGrid accent microarrayer (GeneMachines). After drying, the slides were post-processed according to the oligonucleotide protocol provided by the manufacturer. Briefly, to promote spot uniformity, the microarrays were rehydrated with nuclease-free water and snap-dried on a 100°C hot plate. The slides were UV cross-linked with 65 mJ of energy and shaken for 20 min in a blocking solution of 1-methyl-2-pyrrolidinone, 171 mM succinic anhydride, and 43 mM sodium borate. Finally, the slides were washed successively in water and 95% ethanol and then centrifuged at 800 rpm for 5 min to dry.
Microarray Sample Preparation, Hybridization, and Image Analysis-Starting with 100 ng of poly(A) ϩ RNA, one round of RNA amplification was performed with the MessageAmp aRNA amplification kit (Ambion) using a 4:1 amino-allyl UTP: UTP ratio for aRNA incorporation. For each sample, 8 g of aRNA was coupled to N-hydroxysuccinimidyl esters of cyanine-3 or cyanine-5 (Amersham Biosciences). Following clean up, treated and untreated aRNA samples with opposing cyanine labels were combined, concentrated, and treated with a fragmentation reagent (Ambion) according to the manufacturer's protocol. For each slide, 4 g of both treated and untreated cyanine-labeled aRNA samples were combined with a hybridization buffer (2.3ϫ SSC, 18 mM HEPES, 0.2 mg/ml bovine serum albumin, 0.6 mg/ml poly(A), 0.2% SDS), heat-denatured for 3 min at 95°C, and applied to microarrays under a LifterSlip coverslip (Erie Scientific). The slides were placed in a hybridization chamber (Dietech) and incubated in a 63°C water bath for 16 h. Following hybridization, the slides were successively washed in 0.6ϫ SSC with 0.025% SDS, 0.05ϫ SSC, and water and then dried by centrifugation at 1000 rpm for 3 min. The microarrays were scanned with an Axon 4000B scanner and adaptive spot segmentation performed with GenePix Pro software (version 5.0) (Axon Instruments). For each treated sample, three independent replicate microarray experiments were performed.
Microarray Data Analysis-Triplicate dye-swap, background-subtracted median intensity values were used as input to the LIMMA analysis package (9) in Bioconductor (10), and average LOESS-corrected log 2 ratios were used to estimate differential gene expression. For the PDGF-treated samples, genes with positive log 2 ratios greater than or equal to 1 (2-fold) relative to untreated samples and FDR-corrected (11) moderated t test p values less than 0.01 were considered differentially expressed. Additional microarray dye-swap experiments were performed to identify genes with PDGF-stimulated inductions that were independent of new protein synthesis. RNA was extracted from cells treated with cycloheximide for 30 min followed by 2 or 4 h PDGF treatments. For each of three replicates, two microarray experiments were performed with different reference samples. The first compared cycloheximide and PDGF-treated samples to untreated samples, whereas the other compared cycloheximide and PDGFtreated samples to PDGF-treated samples.
Real Time RT-PCR for Microarray Validations and Heteronuclear RNA Measurements-Reverse transcription of 0.5 g of total RNA was performed in 50 l using SYBR Green RT-PCR reagents and random hexamer primers (Applied Biosystems) as recommended by the manufacturer. Following a 95°C incubation for 10 min, 40 cycles of PCR (95°C/15 s; 60°C/1 min) were then performed on an ABI Prism 7900HT sequence detection system with 0.5 l of the RT reaction, 100 nM PCR primers (supplemental Table 1), and SYBR Green PCR master mix in 5-l reactions. Threshold cycles (C T ) for three replicate reactions were determined using Sequence Detection System software (version 2.2.2), and relative transcript abundance was calculated following normalization with a glyceraldehyde-3-phosphate dehydrogenase PCR amplicon. Amplification of only a single species was verified by a dissociation curve for each reaction.
Chromatin Immunoprecipitation-Chromatin immunoprecipitations were performed as described previously (12), with modifications. Chromatin was immunoprecipitated overnight at 4°C using 6.25 g/ml anti-pol II antibody (N-20) (sc-899, Santa Cruz Biotechnology). Protein A-agarose beads were washed successively in low salt wash, high salt wash, LiCl wash, and twice in 1ϫ TE. Immunoprecipitated chromatin was quantified with real time PCR using primers designed in proximity to the transcription start site as annotated in Entrez Gene (see supplemental Table 1 for primer sequences).
Gene Feature and Genomic Sequence Data-Unless otherwise noted, all gene and transcript annotations, including genomic positions of transcription initiation sites and exon/ intron boundaries for 23,969 human RefSeq transcripts, were obtained from the Entrez Gene data base, corresponding to human genome build version 36.1 (13). Model transcripts (Ref-Seq accession numbers have "XM_" prefix) and transcripts mapped to alternate human contig assemblies (RefSeq accession numbers with "AC_" prefix) were not included in these analyses. For the core promoter analysis and splice site characterization, genomic sequence data were extracted from assembled RefSeq chromosome sequences for human genome build version 36.1.
Gene Ontology Analysis-Gene Ontology (GO) terms were obtained from the Entrez Gene data base for all human genes, and transitive closure of each term relationship was extracted from the daily GO build (14). Functional enrichment of coexpressed gene sets was determined with a one-tailed Fisher's exact test (15) by comparing the frequency of each term and all ancestors terms against the expected frequency from all annotated genes on the microarray. Only genes with at least one GO annotation were included in the analysis.
Transcription Factor Binding Site Analysis-Over-representation of transcription factor binding sites in the upstream regions of immediate-early and delayed primary response genes was analyzed as described previously (12,16), using the program Tractor 5 with 588 vertebrate transcription factor binding site matrices from TRANSFAC Professional (version 11.1) (17) and "minSUM" Match thresholds (18). For each matrix, the predicted site frequencies per gene for both the immediateearly and delayed primary response gene sets were compared with the site frequencies per gene observed in the upstream regions of 350 background genes using a permutation test. These background genes were randomly selected from genes expressed but not induced by PDGF on the microarrays. Two independent analyses were performed. The first used only human sequences for predictions, and the second considered only sites predicted within the same position of a human-dogmouse multiple sequence alignment of each upstream region. Sequences and MULTIZ alignments were selected using the genome browser at the University of California, Santa Cruz (human, dog, and mouse versions hg18, mm8, canFam2, respectively) (19). For both analyses, the results were filtered to only include matrices that predicted, on average, less than 1 site per kb in background sequences and that detected sites upstream of at least 10% of the gene set being tested. p values were adjusted with a false-discovery rate correction (11). Only those matrices meeting the criteria of less than 1 hit per kb of upstream sequence in the background set and at least one hit in 10% of the test genes were considered in the correction.
Core Promoter Analysis-The promoters for immediateearly and delayed primary response genes were scanned using the Match algorithm with no score thresholds (18) and position-specific scoring matrices representing six core promoter elements (supplemental Fig. 1). The regions Ϫ48 to Ϫ21, Ϫ55 to Ϫ5, Ϫ13 to ϩ15, ϩ7 to ϩ38, ϩ17 to ϩ43, and ϩ89 to ϩ177 relative to the transcription initiation site were scanned for the TFIIB recognition element (20), TATA box, initiator, motif 10 element (21), downstream core promoter element (22), and multiple start site element downstream (MED-1) (23), respectively. For each core promoter element, the highest scoring position within each window on the forward strand was recorded for each transcript. Because some genes encode multiple transcripts, the maximum scores among all transcripts were determined for each human gene in the Entrez Gene data base. Assessment of the biological significance of TATA prediction scores was performed as described previously (12). To identify at least 95% of sequences in three classes of TATAbinding sites defined previously (24), "TATAAA," "TAAATA," and "TATATA," a threshold of 0.7 was selected. This threshold was then used to identify the genome-wide frequency of TATA boxes predicted between Ϫ55 and Ϫ5 upstream of 23,969 human RefSeq transcripts.
Cap Analysis of Gene Expression Tag Classification-Human data for cap analysis of gene expression (CAGE) tag clusters and their attributes, and the classification of tag clusters (25), were combined into a single relational data base. For each RefSeq identifier, a representative transcription start site with maximal tag frequency and the associated class identifier were extracted. Carninci et al. (25) limited the cluster annotation to those with at least 100 tags; therefore, only 5,755 unique RefSeq transcripts were associated with a particular class. Classification information was available for 18 of 46 immediate-early genes, 21 of 50 delayed primary response genes, and 6 of 17 secondary response genes. To test for enrichment of each class in the primary response gene sets, one-sided Fisher's exact test p values were calculated.

Kinetics of Primary Response Gene Induction-Microarray
analysis was used to measure changes in gene expression following growth factor stimulation of quiescent human T98G cells, which can be reversibly arrested in the G 0 state by growth factor deprivation (26,27). Cells were rendered quiescent by serum deprivation and then stimulated to re-enter the cell cycle by treatment with PDGF for 0.5, 2, and 4 h. To distinguish primary from secondary response genes, transcript levels were also determined following 2 and 4 h of PDGF treatment in the presence of the protein synthesis inhibitor cycloheximide.
The data for all genes that were induced greater than 2-fold (p Ͻ 0.01) are presented as a heat diagram in Fig. 1 (microarray data are presented in supplemental Table 2). Of a total of 133 induced genes, 49 were induced Ͼ2-fold by 0.5 h, characteristic of immediate-early genes. This group of genes included several well known immediate-early genes such as FOS, FOSB, JUN, NR4A1, NR4A2, and MCL1. In addition, a number of these genes were super-induced in the presence of cycloheximide, as observed previously for immediate-early genes. In contrast, a total of 84 genes were induced Ͼ2-fold only after 2-4 h of PDGF treatment. The initial inductions of 26 of these genes were inhibited at least 50% by cycloheximide, as expected for secondary response genes that require de novo protein synthesis for transcription. These genes included well characterized secondary response genes, such as MMP3 (1) and MMP13 (28). Surprisingly, induction of the remaining 58 genes was not blocked by cycloheximide, even though significant induction of these mRNAs required 2-4 h of PDGF treatment. It is noteworthy that the number of primary response genes exhibiting these delayed kinetics of induction (delayed primary response genes) exceeded both the number of immediate-early genes and secondary response genes induced in these experiments.
The induction kinetics of several representative genes were analyzed using real time RT-PCR with a finer resolution time course following 0.5, 1, 2, 3, 4, 5, and 6 h of PDGF treatment (Fig. 2). Consistent with the current microarray results and previous studies (16,29), two well characterized primary response genes, FOS and MCL1, exhibited rapid but transient inductions that peaked at 0.5 h following PDGF treatment ( Fig. 2A). MMP3, a known secondary response gene, was not significantly induced until 3 h of PDGF treatment and was blocked by cycloheximide, confirming the array results (Fig. 2B). In contrast to FOS and MCL1, five delayed primary response genes, VCL, PLOD2, DKK1, SOD2, and CCND1, demonstrated slower inductions, reaching maximal mRNA levels between 1 and 4 h following PDGF stimulation (Fig. 2C). Consistent with the microarray results, induction of these genes was not blocked by cycloheximide, confirming their classification as delayed primary response genes. Additionally, to validate the array results, transcript levels of a total of 19 genes following 0.5, 2, and 4 h of PDGF treatment, in the presence and absence of cycloheximide, were independently tested using quantitative real time RT-PCR, the results of which confirmed the microarray data (supplemental Table 3).
To determine whether representative genes were induced with similar kinetics in response to mitogens other than PDGF, quiescent T98G cells were alternatively stimulated with EGF or serum in the presence or absence of cycloheximide (Fig. 3). Consistent with the results obtained with PDGF, FOS and FIGURE 1. Genes induced by PDGF in quiescent T98G cells. T98G cells were rendered quiescent by serum starvation for 72 h and then stimulated by treatment with PDGF for 0.5, 2, or 4 h. Where indicated, cells were preincubated with cycloheximide (CHX) for 30 min before PDGF treatment. Results of triplicate microarray analyses are presented as a heat diagram illustrating average log 2 ratios for genes that were induced greater than 2-fold (p Յ 0.01) by PDGF. Genes are arranged by the earliest time point at which the PDGF induction exceeded 2-fold (p Յ 0.01), and classifications are based on the inhibition of PDGF inductions by cycloheximide. Of the 133 genes induced by PDGF, 49 were rapidly induced by 30 min and were considered immediate-early genes. The remaining genes were separated into two groups of 58 delayed primary response genes, which are induced independently of translation, and 26 secondary response genes, whose transcription requires new protein synthesis.
MCL1 were induced as immediate-early genes, MMP3 as a secondary response gene, and VCL, PLOD2, DKK1, SOD2, and CCND1 as delayed primary response genes following both serum and EGF treatment.

Delayed Primary Response Genes
Are Functionally Distinct from Immediate-Early Genes-To gain insight into possible functional differences, the immediate-early, delayed primary response, and secondary response genes were compared using the GO data base. Functional enrichment of GO terms was assessed by analysis of the frequency of GO terms in each set of genes compared with the expected frequency in all annotated genes on the array. The molecular function and cellular component GO terms that were significantly enriched (p Ͻ 0.01) and identified at least 10% of the genes in each group are summarized in Table 1. The immediateearly gene set was highly enriched in molecular function terms related to transcriptional regulation, with "DNA binding" and "transcription factor activity" among the most frequently represented categories (Table 1, top). These functions were not significantly enriched in either the delayed primary response or secondary response genes (Table 1,  top). Similarly, the cellular component term "nucleus" was highly enriched in the immediate-early genes but not in the delayed primary response or secondary response genes ( Table 1, bottom). These findings are consistent with the recognized role of immediate-early genes as encoding transcription factors that then regulate secondary response genes. However, they also suggest distinct functions for the delayed primary response genes, as well as for secondary response genes, compared with the immediate-early genes.
Analysis of Promoters and Upstream Regions-The differing kinetics of induction of immediateearly and delayed primary response genes could result from a variety of factors, alone or in combination, including differences in transcription initiation, elongation, pre-mRNA processing, or mRNA stability. We therefore used a combination of computational and experimental approaches to compare several properties of these groups of genes. Initially, we explored the possibility of differences in the upstream regions of the immediate-early and delayed primary response genes that might be the cause of their distinct kinetics of induction. Co-regulated genes often share similar transcription factor binding sites, and groups of genes demonstrating different kinetics of induction might be expected to be under differential transcriptional control. Upstream regions of immediate-early and delayed primary response genes were therefore analyzed separately for overrepresented transcription factor binding sites compared with a background set of genes that were expressed in T98G cells but not induced by PDGF stimulation (12,16). Sequences corresponding to 1, 3, and 5 kb upstream of each human gene, as well as the corresponding orthologous murine and canine sequences, were analyzed with the Match program using 588 vertebrate matrices from TRANSFAC Professional (version 11.1) and a scoring threshold to minimize the sum of falsenegative and false-positive (minSUM) hits. Analysis of human sequences alone identified matrices representing four transcription factors, serum-response factor (SRF), nuclear factor-B (NF-B, represented by the V$CREL_01 and V$NFKAPPAB65_01 matrices), PAX-3, and early growth response (KROX) transcription factors, as significantly overrepresented in upstream sequences of the immediate-early genes ( Table 2, top; the entire list can be found in supplemental Table 4). In contrast, upstream regions of the set of delayed primary response genes lacked over-represented binding site matrices for these or other transcription factors.
The analysis was extended with phylogenetic footprinting to identify over-represented binding sites that were conserved in orthologous genomic regions of dog and mouse. The statistical analysis was performed with the same background sequence set, but only sites predicted in all three organisms at the same position of a multiple sequence alignment were scored. The top ten conserved transcription factor binding site matrices with the most significant p values are shown in Table 2, bottom (the entire output can be found in supplemental Table 4). As expected (and as was also observed in the human-only analysis), conserved binding sites for known regulators, including SRF, NF-B, cyclic AMP response element-binding protein (CREB), and AP-1 (activator protein-1), were significantly over-represented in the upstream regions of immediate-early genes. Notably, neither these nor other transcription factor binding site matrices were significantly over-represented in the set of upstream regions of the delayed response genes. Similar results were obtained when scanning 1-, 3-, or 5-kb upstream regions.
The core promoter sequences were also examined for possible differences in binding sites for general transcription factors. The core promoter includes the TATA box, the TFIIB recognition element (20), the initiator, the motif 10 element (21), the downstream core promoter element, frequently found in TATA-less promoters (22), and the multiple start site element downstream (MED-1) (23) (Fig. 4A). Core promoter regions of immediate-early and delayed primary response genes were compared by analysis of the promoter sequences of each gene set near the expected positions for six core promoter elements (Fig. 4A). The highest scoring subsequences within these windows were determined using Match with frequency matrices representing each promoter element (supplemental Fig. 1). For each element, the distributions of scores for the immediateearly and delayed primary response genes were compared with one another and to a genome-wide score distribution with the Wilcoxon rank sum test (Fig. 4B).
The results indicate a significant difference in the TATA scores for the immediate-early genes (p ϭ 5.1 ϫ 10 Ϫ7 ) relative to scores for 18,191 human genes in the Entrez Gene data base, whereas significant differences in scores for the other core promoter elements were not observed. Furthermore, immediateearly gene TATA scores were significantly higher than those for the delayed primary response genes (p ϭ 2.0 ϫ 10 Ϫ3 ) (Fig. 4B). In contrast, TATA scores of the delayed primary response genes did not differ significantly from all genes in the Entrez Gene data base (TATA scores for all individual genes can be found in supplemental Table 5).
The distributions of TATA scores for the entire Entrez Gene data base, immediate-early genes, and delayed primary response genes are plotted in Fig. 4C. A threshold score of 0.7, which identifies more than 95% of sequences bound by human TATA-binding protein (hTBP) in vitro (24), was used to define a functional TATA box (12). When applied genome-wide, this threshold identified 22% of genes with at least one transcript containing a TATA box, a figure similar to other estimates of TATA box prevalence (30). Using this threshold, 27 of 46 (59%) immediate-early and 17 of 50 (34%) delayed primary response

Summary of GO term analysis
All GO terms in the molecular function and cellular component categories that were enriched in IEG, D-PRG, or secondary response genes (SRG) (p Ͻ 0.01) and were represented in at least 10% of the genes are summarized (entries with significant p values are indicated in boldface). The fraction of genes with each GO term annotation is listed in the % genes column for each gene set.

TABLE 2
Over-represented transcription factor binding sites upstream of immediate-early and delayed primary response genes 1-, 3-, and 5-kb upstream regions were examined for over-represented transcription factor binding sites using TRANSFAC matrices. The complete TRANSFAC output prior to filtering can be found in supplemental genes contained a TATA box. The TATA box prevalence in immediate-early genes differed significantly (p ϭ 1.0 ϫ 10 Ϫ7 by one-sided Fisher's Exact Test) from the Entrez Gene data base. In a survey of the human and mouse transcriptomes, Carninci et al. (25) experimentally identified several classes of transcription start site signatures using CAGE. Transcription start sites of the single peak class were enriched in TATA boxes (25), so we also compared the immediate-early and delayed primary response genes according to the classes of transcription start sites identified by CAGE. Transcription start sites were divided into single peak, broad, bimodal/multimodal, or broad with dominant peak classes. A significant bias for the single peak class was found in the immediate-early gene set (p ϭ 4.4 ϫ 10 Ϫ7 ), whereas the delayed primary response genes showed only a moderate enrichment for the same class (p ϭ 7.5 ϫ 10 Ϫ3 ) (Fig. 4D, and listed individually in supplemental Table 5). None of the other CAGE classes were significantly enriched in either primary response gene set. The single peak class represents transcripts with a single, well defined transcription start site. Although both primary response gene classes show enrichment of the single peak class, nearly all of the annotated immediate-early genes (15 of 18) were designated as single peak, whereas only about half (11 of 21) of the delayed primary response genes had such an annotation. These results indicate that the immediate-early genes may have a greater tendency to initiate transcription from a well defined initiation site than delayed primary response genes or other genes in the data base. This is also consistent with the observed enrichment of TATA boxes in immediate-early gene promoters.

RNA Polymerase II Occupancy at Promoters of Immediate-Early and Delayed Primary Response Genes-
Identification of distinct transcription factor binding site enrichment and TATA box abundance upstream of the immediate-early genes suggests that the lag in delayed primary response gene expression may result from slower transcription initiation rates, which could be a consequence of RNA polymerase II (pol II) abundance and/or recruitment at target gene promoters. To investigate this further, pol II binding to promoters of immediate-early and delayed primary response genes was investigated by chromatin immunoprecipitation (ChIP) analysis.
Quiescent T98G cells were treated with PDGF for 0 -4 h and subjected to pol II ChIP, using an antibody against the N terminus of pol II so that recognition of pol II was not affected by modifications of its C-terminal domain. pol II occupancy was examined at the transcription start sites of 11 immediate-early (Fig. 5A) and 19 delayed primary response genes (Fig. 5B). All of these genes had pol II occupancy above that observed at the nontranscribed ␤-globin gene (Fig. 5), as well as several other negative control genes (data not shown). For the majority of genes (73% of immediate-early genes and 68% of delayed primary response genes), pol II occupancy did not change upon PDGF stimulation, suggesting a post-polymerase recruitment mechanism may be responsible for gene induction. Preloaded pol II at transcription start sites is not unprecedented, as FOS and MYC are well established examples of genes with a paused polymerase in their proximal promoter regions in unstimulated cells (31).
For the three immediate-early genes that exhibited an increase (Ͼ1.75-fold) in pol II promoter occupancy upon PDGF treatment, all three had maximum pol II occupancy at 0.5 h of PDGF treatment, coincident with their mRNA inductions (Fig. 5A). For the delayed primary response genes that had increased pol II occupancy upon PDGF treatment, three of six (DKK1, DDX21, and ESDN) had peak pol II occupancy after 2 h of PDGF treatment (Fig. 5B), consistent with the possibility that delayed recruitment of pol II may play a part in their delayed mRNA inductions. However, the other three of these six delayed primary response genes (VCL, TGFB2, and EPHA2) demonstrated peak pol II occupancy after only 0.5 h of PDGF treatment, similar to what was observed for some immediate-early genes and much earlier than their mRNA inductions. This suggests that the delay in mRNA induction for these genes occurs after the recruitment of pol II.
Although pol II recruitment does not appear to be a major factor resulting in delayed mRNA inductions, there was a clear difference in pol II occupancy between the immediate-early and delayed primary response gene groups. For both untreated cells (Fig. 5C) and for the time point at which maximum pol II occupancy was observed (Fig. 5D), the immediate-early genes had significantly higher pol II occupancy than the delayed primary response genes (p ϭ 0.026 for untreated cells and p ϭ 0.0017 at the time of maximum pol II occupancy), possibly correlating with the differences in promoters and transcription start sites between these two groups of genes.
Analysis of hnRNA Transcription -Because ϳ70% of the genes tested by ChIP did not demonstrate a change in pol II occupancy upon PDGF treatment, it is possible that transcriptional changes are not responsible for the observed mRNA inductions. To test this, hnRNA levels were measured with real time RT-PCR using 5Ј-biased intronspecific primers (32) for 23 delayed primary response genes following PDGF stimulation (supplemental Table 6). For all of the genes tested, hnRNA levels increased Ն2-fold and were similar to or greater than FIGURE 5. RNA polymerase II occupancy at the promoter regions of immediate-early and delayed primary response genes. RNA polymerase II ChIP assays were conducted on quiescent T98G cells and on cells that were treated for 0.5, 2, and 4 h with PDGF. ChIP primers were designed at the transcription start sites of the corresponding genes, and immunoprecipitated material was plotted as a percentage of input Ϯ S.E. Results are averages of 2-4 determinations. Data for immediate-early genes are presented in A and for delayed primary response genes in B. Brackets indicate genes whose pol II occupancy was induced Ͼ1.75-fold upon PDGF treatment. B-globin was used as the negative control. Other negative controls including ACTC, MYOD1, and MYOG (all muscle-specific genes) yielded similar results (not shown). Data for pol II occupancy at immediate-early compared with delayed response gene promoters is presented as box plots for untreated cells (C) and at the time of maximum pol II occupancy (D). p values were derived using the Wilcoxon rank sum test. the corresponding mRNA inductions upon PDGF treatment. Thus, although pol II occupancy at these genes is not altered, their transcription is induced upon PDGF treatment.
The kinetics of hnRNA synthesis for 12 representative delayed primary response genes and two immediate-early genes (FOS and MCL1) are presented in Fig. 6A. The hnRNA levels for FOS and MCL1 (Fig. 6A, panel 1) peaked at 15 min of PDGF treatment, corresponding to maximum mRNA induction at 30 min. Kinetics of hnRNA induction varied between different delayed primary response genes, which were categorized into three groups with examples shown in Fig. 6A, panels 2-4. For some of these genes, the accumulation of hnRNA and mRNA in the same experiment is compared in Fig. 6B.
Seven of the delayed primary response genes (30%) exhibited a clear delay in hnRNA synthesis, not reaching peak hnRNA levels until 1-3 h of PDGF treatment (Fig. 6A, panel 4). For these genes, the delay in transcription of hnRNA was consistent with their delayed mRNA expression ( Fig. 2 and Fig. 6B). Two of these genes, DKK1 and ESDN, also have delayed pol II recruitment with peak occupancy at 2 h (see Fig. 5B).
In contrast, five delayed primary response genes (22%), including VCL, responded with transcriptional kinetics similar to immediate-early genes, with peak induction of hnRNA at 15 min of PDGF treatment (Fig. 6A, panel 2). These kinetics of VCL hnRNA synthesis are in agreement with the pol II ChIP results, which revealed maximum pol II occupancy at the VCL promoter following 30 min of PDGF treatment (Fig. 5B). The hnRNA levels of these genes increased much earlier than their mRNA levels ( Fig. 2 and Fig. 6B), indicating that mechanisms following the initiation of transcription and the start of productive elongation are resulting in delayed mRNA induction.
A third group of 11 delayed primary response genes (48%) (Fig. 6A, panel 3) showed intermediate delays in transcription of their hnRNAs. These genes are less clearly defined in terms of kinetics of transcription, although it is likely that steps both preceding and following the start of productive elongation may play a role in their delayed mRNA inductions.
Primary Transcript Feature Analysis-The above results indicated that although delays in transcription initiation and/or the start of productive elongation contributed to the lag in expression of some delayed primary response genes, other differences in transcription or processing also contributed to the delay in mRNA formation. We therefore analyzed other features that might affect rates of transcription or mRNA processing, including primary transcript length and intron/exon structure.
Because processing of pre-mRNA is a potential rate-limiting step in gene expression, variations in 5Ј donor and 3Ј acceptor splice sites could indicate a general difference in splicing efficiency between the classes of primary response genes. We therefore compared the 5Ј and 3Ј splice site nucleotide compositions for the immediate-early and delayed primary response genes. However, there was no significant difference between the splice site characteristics of these groups of genes (supplemental Fig. 2). Next, the primary transcript length and exon frequency distributions for immediate-early and delayed primary response genes were compared with each other and to a distribution of all genes in the Entrez Gene data base (Fig. 7).
The analysis indicated a significant difference in both the primary transcript length (p ϭ 4.2 ϫ 10 Ϫ8 ) and exon frequency (p ϭ 1.4 ϫ 10 Ϫ4 ) distributions of immediate-early genes relative to the genome-wide distributions (Fig. 7) (see supplemental Table 5 for individual genes). In contrast, no significant differences were noted when these features of delayed primary response genes were compared with the genome-wide distribution. Furthermore, the immediate-early primary transcripts were significantly shorter that those of the delayed primary response genes (on average, ϳ19 kb versus ϳ58 kb, respectively, p ϭ 2.5 ϫ 10 Ϫ9 ) and contained significantly fewer exons (on average, 5.8 versus 10.4, respectively, p ϭ 1.4 ϫ 10 Ϫ4 ). These results suggested that, in addition to other gene features, the observed lag in mature mRNA induction of some delayed primary response genes may be related to both primary transcript length and exon frequency.

DISCUSSION
In this study, we have undertaken a comprehensive global analysis of the time course of gene induction following growth factor stimulation of quiescent human cells. As expected, we identified both rapid and delayed gene inductions resulting from PDGF stimulation. Forty nine genes were induced within 30 min of stimulation, as expected for immediate-early genes, whereas 84 genes required 2-4 h of PDGF stimulation for maximum induction. Surprisingly, we found that the majority of the genes induced with delayed kinetics (58/84) were primary response genes, because their induction was not inhibited by cycloheximide. The transcriptional program induced by growth factor stimulation thus involved three distinct classes of genes: immediate-early genes, delayed primary response genes, and secondary response genes, which accounted for ϳ37, 44, and 19% of the genes induced within 4 h of PDGF stimulation, respectively. Similar kinetics of induction of representative delayed primary response genes were observed in response to the alternative mitogens EGF and serum, suggesting that their induction kinetics are not PDGFspecific events. Examples of delayed primary response genes have been observed by others in primary human fibroblasts (3,6,29), rat arterial smooth muscle cells (4), and mouse 3T3 cells (5,7,8), but the large number of primary response genes we found to be induced with such delayed kinetics was unexpected, suggesting a more complex regulatory landscape in mammalian cells.
Transcriptional programs are often represented as gene networks, where products of expressed genes activate or repress secondary downstream gene targets. Many analyses assume temporal regulation according to the canonical immediate-early/secondary response gene paradigm to infer protein-gene interactions from correlations in gene expression data (33). By highlighting the unexpectedly high incidence of delayed primary response genes, our results have broad implications for analyses that infer regulatory interactions from temporal correlations in gene expression. Because many genes that are induced with a significant lag after growth factor stimulation are still primary response genes, it cannot be assumed that temporally delayed gene expression requires the prior induction of upstream transcriptional regulators. Because delayed primary response genes represented a major component of the transcriptional response to growth factor stimulation, we used both computational and experimental tools to elucidate the properties of this group of genes. We first sought to determine whether the delayed primary response genes shared similar functions with the immediateearly genes. Therefore, the functional classifications of the immediate-early and delayed primary response genes were compared using the GO data base. The immediate-early genes were enriched in molecular function terms related to transcriptional regulation. This corresponded well with their recognized role as transcriptional effectors in the induction of secondary response genes. In contrast, the delayed primary response genes were not enriched in functions related to transcriptional regulation and had no significant functional overlap with the immediate-early genes. These comparisons suggest that the products of immediateearly genes may have unique functions in regulating the transcriptional response to growth stimulation, whereas the delayed primary and secondary response genes may function as effectors of this transcriptional program. In this regard, it is noteworthy that cyclin D1 was initially described as a secondary response gene in macrophages, whose induction linked cell cycle proliferation to growth factor stimulation (34). However, cyclin D1 behaved as a delayed primary response gene in the present study, as well as in 3T3 cells (8) and human fibroblasts (6).
We also examined the basis for the distinct kinetics of induction of immediate-early and delayed primary response gene mRNAs. Analysis of hnRNA demonstrated that both immediate-early and delayed primary response genes were induced at the transcriptional level. The hnRNAs of immediate-early genes were rapidly induced, coincident with the rapid inductions of their mRNAs. The lag in induction of a number of delayed primary response mRNAs appeared to result from either a delay in transcription initiation or the start of productive elongation, as suggested by the delayed inductions of their hnRNAs. In contrast, hnRNAs of other delayed primary response genes were rapidly induced, suggesting that the lag in mRNA induction resulted from delays in subsequent stages of transcriptional elongation or processing. These differences between the kinetics of induction FIGURE 7. Transcript length and exon frequency distributions for immediate-early and delayed primary response genes. Histograms illustrate the distribution of the minimum transcript lengths and minimum exon frequency distributions across all Entrez Gene transcripts, immediate-early genes, and delayed primary response genes. p values were calculated by the Wilcoxon rank sum test (two-sided for comparison of either D-PRG or IEG to Entrez Gene, and one-sided for comparison of IEG to D-PRG). The differences between the delayed primary response genes and Entrez Gene were not significant for either exon frequency or transcript length. Analysis of maximum transcript lengths and maximum exon frequency distributions yielded similar results. of immediate-early and delayed primary response gene mRNAs appear to be associated with a combination of factors, including the over-representation of upstream binding sites for shared transcription factors, core promoter elements, gene length, and exon frequency.
Computational comparisons revealed striking differences in the prevalence of predicted binding sites for shared transcription factors in the upstream regions of immediate-early and delayed primary response genes. Binding sites for several known regulators, including SRF, AP-1, CREB, KROX, and NF-B, were over-represented in the upstream regions of immediate-early genes compared with other genes that were expressed in T98G cells but not induced by PDGF. In contrast, binding sites for either these or other transcription factors were not significantly over-represented upstream of the delayed primary response genes. The absence of predicted binding site enrichment upstream of the delayed primary response genes may indicate that, although immediate-early genes are activated by a shared set of transcription factors, the delayed primary response genes are controlled by a more diverse set of regulators, which would not be identified as over-represented in the gene set. Alternatively, it is possible that delayed primary response genes contain fewer clusters of transcription factor binding sites near their promoters than immediate-early genes, or that the transcription factor binding sites upstream of delayed primary response genes are lower affinity sites than those upstream of immediate-early genes, because lower affinity sites that are divergent from the binding site matrix might not be scored in the computational analysis. Both of these factors could reduce the affinity of transcription factor binding to the promoter regions of delayed primary response genes, correspondingly reducing their rates of transcriptional activation.
The core promoters of the immediate-early genes also differed from those of the delayed primary response genes. In particular, promoters of the immediate-early genes contained higher affinity TATA boxes than those of the delayed primary response genes. Similarly, the prevalence of TATA boxes in the promoters of immediate-early genes (59%) was significantly higher than in the promoters of delayed primary response genes (34%) or in all genes in the genome (22%). This may have important implications in transcription initiation, with higher affinity TATA boxes conferring greater transcriptional activity on the promoters of immediate-early genes. Reinforcing the notion that the immediate-early genes have stronger, more defined initiation is the demonstration that these genes also have a significant bias for the single peak promoter class defined by CAGE analysis (25). Moreover, because some components of the transcription initiation complex, including TBP, remain bound to DNA following pol II promoter clearance, the stability of these factors may modulate the transcription reinitiation rate. Thus, high scoring TATA boxes present in immediateearly promoters may represent higher affinity TBP-binding sites that confer rapid reinitiation (35). Indeed, previous work demonstrated instability of TBP-TATA interactions following the first round of transcription (36), and noncanonical TATA box sequences diminish binding of TFIIA (37), a general transcription factor that is thought to stabilize the TBP-TATA complex (38).
The differences in both upstream transcription factor binding sites and core promoters are also consistent with differences in the binding of RNA polymerase II to the promoter regions of immediate-early and delayed primary response genes. Chromatin immunoprecipitation indicated that pol II was bound to the promoters of both immediate-early and delayed primary response genes in unstimulated cells, and that pol II occupancy increased on the promoters of about one-third of the genes in both groups following growth factor stimulation. Thus, transcriptional induction of the majority of immediate-early and delayed primary response genes may result from the start of productive elongation by a paused polymerase, rather than by recruitment of pol II to the preinitiation complex. These findings are consistent with previous demonstrations of paused polymerases near the transcription start sites of immediate-early genes, including FOS and MYC (31), as well as with global analyses that have detected preinitiation complexes at the promoters of many nontranscribed genes in human cells (39). Importantly, however, the amount of pol II bound to the promoters of immediate-early genes was significantly greater than that bound to the promoters of delayed primary response genes. These differences in pol II occupancy highlight a key distinction between the immediate-early and delayed primary response gene promoters. Together with the differences in both upstream transcription factor binding sites and TATA boxes, these findings point to transcription initiation, and perhaps reinitiation, as one of the primary mechanisms for rapid responses of immediate-early genes to growth factor stimulation relative to the delayed primary response genes.
Our analysis also revealed significant differences between the immediate-early and delayed primary response genes in both primary transcript lengths and exon frequencies. The immediate-early genes tend to be shorter and contain fewer exons than the delayed primary response genes, which are similar in length and exon frequency to other genes in the genome. These transcript features may contribute significantly to the lag in mRNA expression of delayed primary response genes, particularly for those genes that displayed a rapid induction of transcription, as detected by hnRNA. VCL provides an extreme example of the possible effect of primary transcript length and exon frequency on kinetics of mRNA expression. Analysis of hnRNA established that transcription of VCL was rapidly initiated, similar to immediate-early genes, such as FOS and MCL-1. Consistent with its rapid transcriptional induction, SRF has been reported to be a key inducer of VCL (40). However, the accumulation of VCL mRNA was delayed by 2-3 h compared with the hnRNA. This lag in mature VCL mRNA production may be explained by the 122-kb primary transcript length, which is more than six times the average immediate-early gene primary transcript length, and the presence of 22 exons, which is almost four times the average number of immediate-early gene exons. At the other extreme, DKK (another delayed primary response gene) has a primary transcript of only 3.3 kb containing four exons, comparable with that of the shortest immediate-early genes. In contrast to VCL, transcriptional induction of DKK is delayed for 2-3 h after growth factor stimulation, coincident with increased pol II occupancy at its promoter. DKK may therefore represent an example of a gene whose delayed induction results primarily from a lag in pol II recruitment and transcription initiation.
Multiple differences between immediate-early and delayed primary response genes thus appear to contribute to the distinct kinetics of induction of their mRNAs. The immediateearly genes are characterized by over-representation of binding sites for several transcription factors in their upstream regions, promoters with high affinity TATA boxes, and short primary transcripts containing relatively few exons. In all of these respects, the delayed primary response genes are similar to other genes in the genome. Additional features, such as chromatin structure, may also distinguish immediate-early from delayed primary response genes, as has been reported for genes displaying rapid versus delayed inductions in response to other stimuli (41)(42)(43).
To determine whether these characteristics of immediateearly genes were consistent in other cell types, we analyzed the features of immediate-early genes induced by the mitogenic stimuli EGF in HeLa cells and serum in MCF10A cells (normal human breast epithelial cells) in published data sets (44). As in T98G cells, the immediate-early genes induced in both HeLa and MCF10A cells showed an over-representation of transcription factor binding sites, including sites for SRF, AP1, CREB, KROX and NF-B, which were conserved in mouse and dog (supplemental Table 7; for complete Transfac output, see supplemental Table 8). Likewise, immediate-early genes in HeLa and MCF10A cells had significantly higher TATA scores, lower exon frequencies, and shorter transcript lengths as compared with the genome as a whole (supplemental Figs. 3 and 4). Thus, the immediate-early genes induced in T98G, HeLa, and MCF10A cells by three different mitogens share common characteristics of genomic organization.
The multiple features associated with rapid induction of immediate-early genes may have been selected for based on the functions of immediate-early gene products as transcriptional regulators that mediate subsequent alterations in gene expression in response to growth factor stimulation. The rapid induction of immediate-early genes might be expected to play an important role in achieving a robust cellular response to extracellular signals. In contrast, the lag in induction of both delayed primary and secondary response genes is consistent with the apparent functions of these genes as effectors rather than mediators of growth factor signaling. Thus, immediate-early genes are not only characterized by a lack of requirement for new protein synthesis prior to their transcriptional induction, they also possess distinct genomic features that may have been selected to confer rapid inducibility.