Circadian regulation of diverse gene products revealed by mRNA expression profiling of synchronized fibroblasts.

Genes under a 24-h regulation period may represent drug targets relevant to diseases involving circadian dysfunctions. As a testing model of the circadian clock system, we have used synchronized rat fibroblasts that are known to express at least six genes in a circadian fashion. We have determined the expression patterns of 9957 transcripts every 4 h over a total period of 76 h using high density oligonucleotide microarrays. The spectral analysis of our mRNA profiling data indicated that approximately 2% (85 genes) of all expressed genes followed a robust circadian pattern. We have confirmed the circadian expression of previously known clock or clock-driven genes, and we identified 81 novel circadian genes. The majority of the circadian-regulated gene products are known and are involved in diverse cellular functions. We have classified these circadian genes in seven clusters according to their phase of cycling. Our pathway analysis of the mRNA profiling data strongly suggests a direct link between circadian rhythm and cell cycle.

Genes under a 24-h regulation period may represent drug targets relevant to diseases involving circadian dysfunctions. As a testing model of the circadian clock system, we have used synchronized rat fibroblasts that are known to express at least six genes in a circadian fashion. We have determined the expression patterns of 9957 transcripts every 4 h over a total period of 76 h using high density oligonucleotide microarrays. The spectral analysis of our mRNA profiling data indicated that ϳ2% (85 genes) of all expressed genes followed a robust circadian pattern. We have confirmed the circadian expression of previously known clock or clockdriven genes, and we identified 81 novel circadian genes. The majority of the circadian-regulated gene products are known and are involved in diverse cellular functions. We have classified these circadian genes in seven clusters according to their phase of cycling. Our pathway analysis of the mRNA profiling data strongly suggests a direct link between circadian rhythm and cell cycle.
The circadian rhythm allows organisms to anticipate daily environmental changes, and evolutionary conserved molecular clock and oscillator mechanisms have been observed in plants, fungi, and mammals (1). Circadian oscillators are generated by a small number of clock proteins that are regulated by selfsustained transcriptional-translational feedback loops, which drive the rhythmic expression of "clock-controlled" output genes. In mammals, the central pacemaker is located in the suprachiasmatic nucleus (SCN) 1 of the hypothalamus and is adjusted and entrained to the 24-h environmental light/dark cycle by photic stimuli via the retina. Surprisingly, confluent cultured fibroblast cells show an identical (2) circadian clock mechanism following synchronization with a 50% v/v serum shock treatment (3) and can be used as a simple model to study circadian gene expression. The discovery of a circadian clock in peripheral cells strongly suggests that this rhythmic mechanism controls basic cellular functions. So far the extent of gene expression following a circadian rhythm in cultured fibroblasts is still unknown, as well as its overall nature. We report on fibroblast experiments using high density oligonucleotide microarray expression profiling (GeneChip®, Affymetrix) to determine the expression levels of 9957 rat genes over a 76-h period after synchronization.

EXPERIMENTAL PROCEDURES
Sample Preparation and Chip Hybridization-Rat 3Y1 embryonic fibroblasts (passage 30) (4) were grown in Dulbecco's modified Eagle's medium with 5% v/v fetal calf serum (Roche Molecular Biochemicals), 2 mM L-glutamine, 100 IU/ml penicillin, and 100 mg/ml streptomycin (Life Technologies, Inc.) at 37°C in 7.5% CO 2 and were plated (in duplicate) on 15-cm dishes at 3 ϫ 10 6 cells/plate and grown in the same medium for 7 days. The cells were confluent for 4 days before serum synchronization. At this time (t ϭ 0 h), cells were treated with 50% v/v horse serum (Roche Molecular Biochemicals) for 2 h and then kept in serum-free Dulbecco's modified Eagle's medium (3). Every 4 h, the content of two plates was lysed in 10 ml of RNAzolB TM (AMS Biotechnology) and total RNA extracted following the manufacturer's protocol. Target preparation for chip hybridization was done as described (5); briefly, 20 g of total RNA was used as starting material, and 30 g of in vitro transcribed biotinylated target was fragmented, applied to the chip, and hybridized overnight at 45°C. Four spiked controls were included in the hybridization mix (BioB, BioC, BioD, and Cre at 1.5, 5, 25, and 100 pM, respectively). We used Affymetrix high density arrays, custom made for Hoffmann-La Roche, testing a total of 12,204 rat sequences on 2 arrays: 2840 public sequences present in GenBank TM as of June, 1998, and 9364 proprietary ESTs (Incyte Pharmaceuticals). After washing at 45°C and antibody amplification, the arrays were stained and scanned on a Hewlett-Packard scanner. The raw data were exported from the "GeneChip" software program (Affymetrix) and subsequently analyzed with ad hoc programs.
Expression Data Analysis-To allow for comparisons between arrays, the total signal intensities (sum of signal intensities of all individual genes on each chip) were normalized to the mean signal intensity of all chips.
To detect periodic variations in gene expression, we modeled the time series according to a second order autoregressive process (2,6), and because outliers may be present, a robust estimation procedure for the coefficients was used (7,8). Moreover, before applying the spectral analysis method, a robust linear regression ("least trimmed squares robust regression") (9) was employed to remove any linear trend from the data corresponding to a drift of the base line. The spectral analysis was then applied to the residuals. For each time series, we calculated the frequency V max where the spectrum f reaches its maximum; with this method non-cycling transcripts would have a frequency of 0.
To select genes with a strong circadian expression pattern, the curves from the 4006 cycling time series identified by spectral analysis were plotted in Excel (Microsoft) (data not shown). The analysis was eased by the fact that it was a time course experiment. Changes in expression levels were regarded as significant, if they were present at successive time points in the same direction, followed by changes in the other direction over several subsequent time points. Genes were scored from 1 to 4 by four independent investigators. The highest mark (ϭ4) went to genes with at least two peaks of maximal expression, a cycle period of 20 -24 h, and high reproducibility between replicates. All the genes that got a mark between 2 and 4 ere selected for further analysis RNase Protection-The rev-erb␣ probe corresponds to a 230-bp frag-* This work was supported by ARC, LNCC, FRM, CNRS, MENRT, and ENS Lyon. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.   (10). The 36B4 control probe corresponds to a 119-bp fragment (residues 24 -143) of the rat acidic ribosomal phosphoprotein P0 cDNA (GenBank accession number X15096). RNase protection assay was performed as described (11). For each time point, 10 g of total RNA (also used for chip target preparation) was hybridized overnight at 58°C with 150 and 300 pg of the rev-erb␣-and 36B4-labeled probes, respectively. Signals were quantified with a STORM 860 bio-analyzer (Molecular Dynamics).
Samples contained 1ϫ TaqMan Buffer A, 4% glycerol, 5 mM MgCl 2 , 200 M each dATP, dCTP, and dGTP, 400 M dUTP, 0.05 units of AmpliTaq Gold polymerase, 0.01 units of AmpErase UNG, 200 nM primers, and 100 nM TaqMan probe in a 25-l volume or Faststart DNA master SYBR green I mix (Roche Molecular Biochemicals), with 3 mM MgCl 2 , 500 nM primer each in a 20-l volume. The PCR conditions were as follow: with the ABI Prism 7700, 2 min at 50°C, 10 min at 95°C, then 40 cycles of 15 s at 95°C, and 1 min at 60°C. With the light cycler 10 min at 95°C, then 10 s 95°C, 5 s 55°C, 15 s 72°C for 40 cycles, followed by a melting curve analysis. The expression levels were normalized either to the levels of glyceraldehyde-3-phosphate dehydrogenase or to the rat acidic ribosomal phosphoprotein P0 (GenBank TM accession number X15096). Relative mRNA abundance was calculated using the comparative Ct method or a standard curve method as described (12).  Confluent, non-dividing 3Y1 fibroblasts, were synchronized for 2 h with 50% v/v horse serum. After removal of the medium, fibroblasts were harvested every 4 h (independent duplicates), and total RNA was extracted and cRNA target prepared for hybridization on the microarrays. The 12,204 probe sets present on the microarrays do not represent an equivalent number of genes, because some probe sets possibly test the same gene. Between June 1998 and April 2001 3418 ESTs could be matched to newly sequenced genes, and from these ESTs 24% were matching the same gene. If this percentage is extrapolated to all ESTs, our array included an estimated number of 9957 independent genes.
The normalized mean intensity and standard deviation of hybridization signals for each gene was calculated for each time point. To perform expression-profiling analysis of the experimental data, we used a methodology based on spectral analysis of time series as follows: 4006 probe sets gave a periodic hybridization signal as analyzed by the autoregressive process; 5936 probe sets gave a constant hybridization signal over all time points; and 2263 had hybridization signals below the detection threshold indicative of non-expressed genes or undetectable transcripts. Because the above numerical analysis was based on the mean hybridization intensities (without considering standard deviations), a subsequent visual inspection of the data (see "Experimental Procedures") was indispensable to further identify transcripts with robust circadian expression patterns. By using these combined data analysis approaches, we have identified 86 transcripts with clear rhythmic and circadian expression profiles as shown in Table I (see also the Supplemental Material containing median hybridization values, first peak, period, and quality and Table II or the raw data excel file supp_cycling_raw_data.xls).
Among the 86 transcripts, 20 were known published sequences and 66 were ESTs from a private data base (Incyte). The EST sequences were used to search the public data base GenBank TM for further identification and updating: 22 ESTs were identical to sequences previously published in the data

FIG. 2. Qualitative assessment of the GeneChip® microarray technology by comparisons with well established approaches such as RNase protection and Q-PCR.
A, good reproducibility between the different technologies is shown by the normalized expression profiles of rev-erb␣ expression as measured by RNase protection, microarrays, and Q-PCR. dbp and cry1 were also confirmed by Q-PCR. Note that the Q-PCR time points 4 and 12 h were not determined. B, comparison of the expression profile obtained by microarray and Q-PCR for RING3 (ID04528), the 26 S proteasome subunit (ID06960), and a putative protein (ID08649). The Q-PCR time points correspond to the first minimum and maximum expression level for a given gene determined by the microarray technology. bases (BLAST E value ϭ 0) and 25 were similar to sequences in the data bases (BLAST E value ranging from 1E-178 to 3E-37, median ϭ 5E-142). Therefore, we could identify 67 cycling transcripts present in the public data base as follows: 61 coding for proteins with known function and 6 with unknown function. The 19 remaining sequences were ESTs without known sequence homology (Table I).
Four genes known to have an ϳ24-h cycle of expression both in vivo and in synchronized cell lines (3) are present on the microarray used here. In addition to rev-erb␣ and dbp, we identified cry1 and per2 transcripts after EST identification against known sequences. These two genes were recently shown to be essential protein components of the circadian clock (13,14). As shown in Fig. 1 and 2A, the four transcripts were cycling in our experiment. Among other cycling transcripts, we found two independent ESTs encoding the same product, DBP. The hybridization signals for both probe sets were almost identical, showing a common cycling expression pattern (data not shown). dbp was therefore tested twice with two separate probe sets, hence our identification of 86 sequences with circadian expression corresponds to 85 independent genes. Our data on five known cycling transcripts demonstrated the validity of our experimental set-up and our expression data analysis and indicated a good reproducibility of the microarray methodology used here.
Are the microarray data comparable with those obtained using other techniques such as RNase protection assay or real time quantitative reverse transcriptase-PCR (Q-PCR)? By RNase protection assays and Q-PCR, the rev-erb␣ transcript was shown to cycle with a pattern very similar to the one obtained from the microarray analysis ( Fig. 2A), and the same was true for dbp and cry1 analyzed by Q-PCR ( Fig. 2A). By using Q-PCR, we further confirmed the cycling pattern of eight independent transcripts as shown in Fig. 2B for RING3, ID06960, ID08649 and similar data not shown for per2, cyclin D3, ID01672, ID04871, and ID08242. This comparison clearly demonstrated that microarray data were comparable with those obtained with well established methodologies.
The most striking observation from our experimental data analysis was that circadian patterns of expression were observed for numerous transcripts that encode proteins belonging to several distinct functional classes. We found proteins known to be involved in housekeeping functions, transcription, translation, protein degradation, signaling, cell cycle, transport, and trafficking. These cycling transcripts encoded proteins such as enzymes, structural proteins, transcription factors, receptors, or ion channels ( Fig. 3 and Supplemental Material). Are there common features among these cycling gene transcripts such as their period or amplitude levels that could reflect direct interactions, common physiological responses, or specific pathway activations? Hierarchical clustering is widely used to extract patterns from gene expression profiles and to group genes accordingly to reduce the complexity of the data. We clustered the 85 cycling transcripts based on their expression profiles between t ϭ 0 and 48 h by a hierarchical correlation method (15). In this experiment, genes displayed the most robust cyclic expression pattern during this time window, and we hypothesized that this would lead to the creation of the most biologically relevant clusters. As shown in Fig. 4, A and B, seven main clusters accounting for 82 genes were identified. Four transcripts had expression patterns too different to belong to any cluster ( Fig. 4A and Table I). The individual expression profiles representative of each cluster are shown in Fig. 4B, whereas the median values of the relative hybridization intensities of all transcripts in each cluster are shown in Fig. 4C. This allows the visualization and comparison of the seven cycling patterns. Because clusters 4 and 5 are very similar except for a sharp peak at t ϭ 4 h in cluster 4, probably due to the serum addition, they could be considered as a single profile. In this case the first peak appearing after t ϭ 8 h in each of the six different clusters is shifted from one to the next by a time lag of 4 h, representing all possible phases of a 24-h rhythm. As expected, gene transcripts known to cycle in phase like rev-erb␣ and dbp were clustered together in cluster 3. Although all genes found in a given cluster were roughly cycling according to a circadian rhythm, they did not belong to one particular protein functional class (Table I). On the other hand, a given cluster may contain functionally related genes or genes whose products may have a higher likelihood of interaction. Here are some examples of such cycling transcripts. Cluster 7 contained both the transcription factor CBF-C (ID01672) that binds to the common CCAAT motif present in several mammalian promoters (16) and the transcription activator TAFII32 (ID10277), part of TFIID, a transcription initiation factor that binds TATA regulatory sequences (17). Cluster 3 contains two previously known clock or clock-driven genes (rev-erb␣, dbp), as well as the transcription factor FEZ1 (ID08023), a leucine-zipper protein.
Other genes were also expressed in a circadian fashion. Cluster 4 includes the glutamate receptor 4 (ID01171) transcript, and cluster 6 contains the serotonin 5-HT 5B receptor transcript (ID00746). The leukocyte common antigen-related phosphatase (ID00757 in cluster 5) is known to regulate the action of insulin by dephosphorylation of the insulin receptor (18). The endothelin-converting enzyme (ID05028 in cluster 6) is known to produce active endothelin that serves as a potent vasoconstrictor and possesses mitogenic properties, causing proliferation and hypertrophy of vascular smooth muscle, cardiac myocytes, bronchial smooth muscle, and fibroblasts (19). Our data suggest that these functions might be circadian. Similarly, the recently identified granin-like neuroendocrine peptide precur-sor (proSAAS) (20), present as ID10275 in cluster 4, could be a circadian-regulated endogenous inhibitor of prohormone convertase 1, the protease responsible for the processing of many neuroendocrine peptides such as proopiomelanocortin, the precursor of several signaling substances.
Moreover, we identified four cycling transcripts coding for proteins known to play a role in the regulation of the cell cycle. We found, in different clusters, the replication-dependent histone H2A1 (ID08025 in cluster 4), cyclin D3 (ID09966 in cluster 2), cell division protein kinase 4 CDK4 (ID06657 in cluster 1), and RING3 (ID04528 linked to cluster 5). Although the last three transcripts were not clustered together because of different profiles between t ϭ 0 and 12 h, they clearly showed the same phase of expression for the first two peaks at 24 and 48 h (Fig. 5). DISCUSSION What is the overall percentage of all expressed genes showing a circadian pattern of expression? Accurate gene expression studies rely on the maximal sensitivity and reproducibility of the methodology used to detect mRNA levels. The microarray protocol used here allowed us, in principle, to detect mRNA transcripts present at a frequency of 1/150,000, or equivalent to 2-6 mRNA molecules/cell, as estimated by the hybridization intensity of spiked transcripts (data not shown). Therefore, it should be able to detect the vast majority of transcripts that are typically found at a concentration of 1-10 copies/cell. The presence or absence of a given transcript in a sample was calculated using the present call algorithm of the Affymetrix software. 5,815 sequences, representing 4,361 independent genes, were called present in at least half of the time points. Because 85 independent transcripts are found cycling, it represents a frequency of ϳ2% of all present genes. By knowing how many genes are expressed in a single cell, one can extrapolate the total number of cycling transcripts. By using the SAGE technology, it has been previously estimated that a single cell expresses about 13,000 genes (21). Therefore, our data indicate that one should expect a total of about 260 genes to follow a circadian rhythm of expression after synchronization.
A recent paper (22) reported the identification of 51 circadian transcripts in liver cells in vivo by a differential display technique. 14 of these transcripts were present on our microarrays: four transcripts had hybridization signals below the detection threshold, and nine showed a transient expression peak between time 0 and 24 h as described by Kornmann et al. (22), but in our experiment they remained stable for the remaining observation period. Only dbp displayed a cyclic expression pattern for more than one cycle. This may indicate that these genes have a circadian expression profile in vivo that is not maintained in cultured cells.
The main difficulty we encountered in this study was to define statistically the cycling transcripts. We tested two methods to detect periodic signals, spectral analysis and autocorrelation. These methods were developed and are adequate to analyze electronic signals where at least 1000 time points can easily be sampled, but they show shortcomings in biological systems characterized by a higher variability, varying amplitude of the signal, and a small number of time points available for analysis. While detecting our positive controls, rev-erb␣, dbp, cry, and per as periodically oscillating, these methods were unable to score them as statistically significant. Therefore, we used these mathematical methods as periodicity detector that had to be followed by a qualitative analysis to take into account the standard deviation of the measurement. This may have resulted in a conservative estimation of the number of cycling transcripts.
As the clustering was obtained from a phenomenological data analysis, further molecular studies such as comparisons of gene promoter regulations will be necessary to reveal why those transcripts are co-regulated in a circadian manner. Although different levels of mRNA transcript do not always translate into equivalent changes in protein levels, our clustering data suggest that gene products with similar profiles might have a greater likelihood of interactions with each other than with those whose transcripts are cycling with a different phase. Also of interest is the circadian expression of the glutamate receptor 4 transcript in cluster 4, which is known to be expressed in the SCN (23) and of the serotonin 5-HT 5B receptor transcript in cluster 6. A circadian pattern of expression for the 5HT 2C receptor could be shown in the hippocampus (24) as well as for the GluR2/3 subunits of the ␣-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid receptor in the SCN at the protein level (25). Glutamate and serotonin are thought to mediate and modulate light resetting of the mammalian clock, respectively (26,27). Circadian expression of these receptors may indicate that the input component of the clock is a target of the circadian oscillator. The most remarkable common physiological feature was observed with four genes implicated in the regulation of the cell cycle. Cyclin D3, CDK4 and RING3 showed the same pattern of expression between 12 and 56 h. Cyclin D3 and CDK4 are known to interact directly (cyclin-D3 activating CDK4) and to play an important role in the G 1 /S transition phase in the cell cycle (28). The cyclin-D3 protein amount is regulated at the transcriptional level, and its half-life is 25 min (29). Circadian rhythm and the cell cycle have been linked in vivo. In humans, circadian variation are observed in the number of progenitor cells detected in bone marrow (30), as well as in the level of expression of cell cycle-associated proteins (cyclins) in biopsy samples of the oral mucosa (31). A role for CDK4 on circadian rhythms was suggested previously (32), but it could not be determined if the effect on phase shifts of the circadian rhythm by CDK inhibitors were due to the direct inhibition of CDK or due to a general inhibition of transcription. RING3 is a novel serine-threonine kinase located in the nucleus, which can transactivate cyclin (D11, A, and E) and dihydrofolate reductase genes (33). RING3 is implicated in cell cycle, and its expression pattern is circadian and synchronous with those of cyclin D3 and CDK4 in our experiment. Together, these results indicate that transcripts of several proteins involved in the control of the cell cycle are regulated, in nondividing confluent cells, according to a circadian rhythm.
We conclude that, in synchronized fibroblasts, 2% of all gene transcripts show a circadian pattern of expression. The 85 independent cycling transcripts described and clustered here encode a surprisingly large number of functional protein classes, some representing potentially interesting drug targets for diseases involving the circadian clock such as depression, jetlag, insomnia, and other sleep disorders. Our data analysis also suggests a molecular link or pathway between the cell cycle and endogenous circadian cellular clock functions.