Identification of transcriptional networks during liver regeneration.

The molecular analysis of mammalian cellular proliferation in vivo is limited in most organ systems by the low turnover and/or the asynchronous nature of cell cycle progression. A notable exception is the partial hepatectomy model, in which quiescent hepatocytes reenter the cell cycle and progress in a synchronous fashion. Here we have exploited this model to identify regulatory networks operative in the mammalian cell cycle. We performed microarray-based expression profiling on livers 0-40 h post-hepatectomy corresponding to G0, G1, and S phases. Differentially expressed genes were identified using the statistical analysis program PaGE (Patterns from Gene Expression), which was highly accurate as confirmed by quantitative reverse transcription-PCR of randomly selected targets. A shift in the transcriptional program from genes involved in lipid and hormone biosynthesis in the quiescent liver to those contributing to cytoskeleton assembly and DNA synthesis in the proliferating liver was demonstrated by biological theme analysis. In a novel approach, we employed computational pathway analysis tools to identify specific regulatory networks operative at various stages of the cell cycle. This allowed us to identify a large cluster of genes controlling mitotic spindle assembly and checkpoint control at the 40-h time point as regulated at the mRNA level in vivo.

The liver is one of few organs that retains the capacity to rapidly respond to changes in mass and/or function in both humans and animals. This property has significant implications for a variety of clinical situations, including surgical removal of a portion of the liver, such as that which occurs following tumor resection or living-related liver transplantation and for recovery from fulminant liver failure. In contrast to the synchronous and extensive loss of liver mass that occurs in surgical contexts and acute liver failure, the majority of human liver diseases are characterized by repetitive inflammatory or toxic insults to the liver that are associated with loss of liver cells as the result of necrotic or apoptotic cell death. As a consequence of repetitive injury, proliferative signals to hepatocytes are associated with increased risk of hepatocellular carcinoma. However, the mechanisms that regulate both normal and pathologic proliferation are still poorly understood.
A widely used experimental model of hepatic growth is the partial hepatectomy model in rodents in which ϳ70% of the liver is removed and restoration of liver mass occurs within 10 -14 days (1). Within 30 min of the surgery, cytokine and growth factor stimulation leads to the activation of preexisting transcription factors, including NFB, Stat3, AP-1, and C/EBP␤, and induction of a large number of genes responsible for stimulating normally quiescent hepatocytes and nonparenchymal liver cells to reenter the cell cycle and ultimately restore the liver mass (2)(3)(4)(5). The genes that are either increased from basal levels of expression or are induced de novo encode proteins involved in maintaining homeostasis and stimulating cells to reenter the cell cycle and proliferate. In a model proposed by Fausto (2), there appear to be at least two distinct phases of liver regeneration that are regulated via different mechanisms: a priming phase in which quiescent hepatocytes are induced to reenter the cell cycle as a result of tumor necrosis factor ␣ and interleukin 6 stimulation, followed by a second phase in which hepatocytes respond to growth factors and progress through the G 1 stage of the cell cycle (2). The priming phase in the rodent hepatectomy model in which immediateearly genes are activated corresponds to the first 4 h posthepatectomy (6,7). The peak of DNA synthesis occurs ϳ40 h after resection in the mouse, and the period immediately preceding this is associated with the induction of cell cycle genes, including cyclin D1, fox M1b, cyclin-dependent kinases, and the cyclin-dependent kinase inhibitor p21 (also known as CDKN1A) (8 -12). Many of the individual genes involved in the priming phase of liver regeneration have been identified using mouse genetic models and more recently through the use of high density oligonucleotide arrays (6,(13)(14)(15)(16)(17)(18)(19)(20)(21)(22). However, information regarding the pattern of gene expression in the periods following this immediate early priming phase is still limited. In addition, as a consequence of the complexity and robust nature of the transcriptional changes that occur in response to partial hepatectomy, there remains much to be learned regarding the integration of the products of these genes into the complex signaling and transcriptional hierarchies that result in this highly synchronized regenerative response. In this study, we utilized a cDNA microarray enriched for genes expressed in hepatocytes to profile changes in gene expression in mouse liver cDNA samples isolated 0, 2, 16, and 40 h post-hepatectomy (23,27). We selected these time points to detect differentially expressed genes during the priming phase, mid-G 1 , and peak of S phase in the mouse partial hepatectomy model (see Fig. 1A).
Previous microarray studies performed using models of liver regeneration have produced lists of temporally regulated genes but have failed to establish how these genes may form regulatory networks (6, 18 -20). Moreover, these studies have predominantly identified metabolic factors and other highly expressed genes that are easily detected through microarray analysis. In the present study, we highlighted novel approaches to the analysis of microarray data, focusing specifically on key time points during the hepatocyte cell cycle that have previously been overlooked. We leveraged novel computational tools to identify themes and regulatory networks that govern hepatic proliferation in vivo. We demonstrated that this approach identifies even those genes in which expression levels are below the sensitivity limit of microarray analysis, including those encoding growth factors and DNA binding proteins. This paradigm will be generally applicable to other expression data sets of similar or higher complexity.

MATERIALS AND METHODS
Preparation of Samples-12-16-week-old C57BL6/SV129 mice were subjected to partial hepatectomy as described previously (22). The animals were sacrificed, and RNA was prepared from the remaining liver lobes taken from mice at 0, 2, 16, and 40 h post-hepatectomy by homogenization in guanidine isothyocyanate and purification by cesium chloride ultracentrifugation (25). All RNA samples were analyzed using an Agilent Bioanalyzer Lab-on-a-Chip Nano 6000 chip to determine the integrity and concentration of the samples. Only samples passing this quality control step with a ratio of the 28 -18 S RNA of Ն2.0 were used for expression analysis.
Microarray Expression Profiling and Data Analysis-20 g of total RNA was indirectly labeled using amino-allyl dUTP and an anchored oligo(dT) 20 to prime reverse transcription. Flourescent label (Cy™Dye, Amersham Biosciences) was coupled to the cDNA and hybridized to the PancChip version 5.0 13K cDNA microarray (23,27). For each array hybridization, labeled liver RNA from one of the four time points was hybridized with a common control RNA sample labeled with a different Cy TM Dye. The common control RNA comprised equal quantities of a mixture of RNAs taken from both wild type and C/EBP␤ Ϫ/Ϫ mutant mice (28) prepared from quiescent (0 h), 2, 16, 24, 30 40, and 48 h post-hepatectomy. This diverse pool of RNA ensured good expression coverage and was used to minimize the production of zero denominators in the calculation of expression ratios for the time course analysis.
Four biological replicates were used for each of the 0-, 2-, 16-, and 40-h time points. For each array hybridization, a dye-swap hybridization was performed, such that for each time point, two of the biological replicates were hybridized as test (Cy5, red) versus common control (Cy3, green) and the other two biological samples as common control (Cy5) versus test (Cy3). The replicate dye-swap analysis reduced the impact of dye bias or other labeling artifacts on the ratio of gene expression at a given time point. The median intensities of each spot on the array were measured by an Agilent Scanner using the GenePix version 5 software, and the ratio of expression for each element on the array was calculated in terms of M (log 2 (red/green)) and A ((log 2 (red) ϩ log 2 (green))/2)). The data were normalized by the print tip lowess method using the Statistical Microarray Analysis package in the software package R (29). For statistical analysis, the genes were identified as differentially expressed using the Patterns from Gene Expression package (PaGE 1 version 5.0) as described previously (30). Twoclass, unpaired data tests were also performed to specifically identify genes that were differentially expressed by more than 1.5-fold when comparing the different time points: 0-versus 2-h, 0-versus 16-h, and 0versus 40-h. In all cases, a false discovery rate of 0.2 was chosen to identify differentially expressed genes. The microarray data are available through the Minimum Information About a Microarray Experiment (MIAME)-compliant data base RAD (RNA abundance data base) (31) at www.cbil.upenn.edu/RAD. Quantitative Real-time Reverse Transcription PCR (QPCR)-Differential gene expression was confirmed using QPCR. Liver RNA (500 ng) was reverse transcribed at 42°C for 1 h with 1 g of oligo(dT) primer (Invitrogen) and Superscript II reverse transcriptase. PCR reaction mixes were assembled using the Brilliant® SYBR Green QPCR Master Mix (Stratagene). 10 M primers, 1 l of cDNA, and the included reference dye at a 1:200 dilution were prepared according to manufacturer's instructions. The reactions were performed using the SYBR Green (with Dissociation Curve) program on the Mx4000 TM Multiplex Quantitative PCR System (Stratagene). Cycling parameters were 95°C for 10 min and then 40 cycles of 95°C (30 s), 58°C or 60°C (1 min), and 72°C (30 s) followed by a melting curve analysis. All reactions were performed with four biological replicates and three technical replicates with reference dye normalization. The median cycle threshold value was used for analysis, and all cycle threshold values were normalized to the expression of the housekeeping gene TBP (confirmed not to be differentially expressed with microarray expression analysis). Primer sequences are available upon request. The Student's t test was used to confirm that the QPCR was significant and matched the direction of the -fold change predicted by the array. An unpaired t test using the one-tailed probability table with a p value significance cutoff of 0.05 was used for all genes tested.
Biological Theme Analysis-The software application Expression Analysis Systematic Explorer (EASE) was used to discover biologically relevant themes in the lists of genes identified as significantly differentially expressed at 2, 16, and 40 h post-hepatectomy (32). Briefly, EASE first maps Entrez Gene identifiers for all genes assayed using the PancChip-to-gene categories represented by the gene ontology function. Within various categorical systems supported by the gene ontology function, the "population total" is determined for each system of gene categorization (e.g. biological process or molecular function), and the "population hits" are determined for every category within those systems (e.g. hormone metabolism or protein biosynthesis). In a second step, the lists of genes differentially expressed at each time point are analyzed to determine which functional categories are over-represented. Third, the probability of obtaining the number of genes in a certain pathway in the list of differentially expressed genes is then compared with the representation of the same pathway among all the genes on the microarray and is calculated as the Fisher exact probability. This probability is then used to calculate the metric known as the "EASE score," which is a conservative adjustment to the Fisher exact probability that weights significance in favor of themes supported by more genes and therefore yields more robust results. To be able to perform the EASE analysis, the 11,681 of the 13,008 expressed sequence tag clones on the PancChip M5.0 were annotated with Entrez  gene identifiers (formerly Locus identifiers) using assembly annotation from the Data Base of Transcribed Sequences (DoTS version 9.0) (55) and annotation tools available at the Data Base for Annotation, Visualization and Integrated Discovery (DAVID) (56). The remaining expressed sequence tag clones were excluded from the analysis, as they represented novel transcripts, and these elements on the array could not be annotated with a gene identifier. To prevent false over-representation of a particular gene identifier, all duplicates were removed from the lists giving a population of 7,125 unique gene identifiers and 155, 259, and 479 unique gene identifiers representing genes differentially expressed by Ͼ1.5-fold at 2, 16, and 40 h, respectively. The data were analyzed for themes using the gene ontology cellular compartment, molecular function and biological process provided by NCBI (33). The lists of up-and down-regulated genes at each time point were analyzed with the categorical over-representation function of EASE, and the most significantly over-represented categories were identified using EASE scores. All significant (p Ͻ 0.05) categories resulting from each list produced the over-represented biological themes shown in supplemental Table 2S.
Pathway Analysis-Biologically relevant networks were drawn from the lists of genes that were differentially expressed at the 2-, 16-, and 40-h time points when compared with the 0-h time point. These data were generated through the use of Ingenuity Pathways Analysis, a web-delivered application (www.Ingenuity.com) that enables the visualization and analysis of biologically relevant networks to discover, visualize, and explore therapeutically relevant networks.
Expression data sets containing gene identifiers (Entrez gene identifiers) and their corresponding expression values as -fold changes were uploaded as a tab-delimited text file. Each gene identifier was mapped to its corresponding gene object in the Ingenuity Pathways Knowledge Base. The genes identified as differentially expressed by statistical analysis using PaGE described previously and which had a -fold change between any of the time points of Ͼ1.5 were included in the analysis. These genes, called "focus genes," were then used as the starting point for generating biological networks. To start building networks, the application program queries the Ingenuity Pathways Knowledge Base for interactions between focus genes and all other gene objects stored in the knowledge base and generates a set of networks. The program then computes a score for each network according to the fit of the network to the set of focus genes. The score indicates the likelihood of the focus genes in a given network being found together due to random chance. A score of Ͼ2 indicates that there is a Ͻ1 in 100 chance that the focus genes were assembled randomly into a network due to random chance.

Identification of Temporally Regulated Genes during Liver
Growth-To identify genes that were differentially expressed at specific phases of the regenerative process, liver RNA samples were obtained from mice 0, 2, 16, and 40 h after partial hepatectomy. These time points correspond to quiescent liver, early G 1 , mid-G 1 , and S phase of the hepatocyte cell cycle in this system (Fig. 1A). One significant challenge inherent to high throughput analysis of large scale changes in gene expression is the development of statistical methods that will maximize both the sensitivity and specificity of detection of differentially expressed genes. The PaGE statistical analysis package is a tool for analyzing microarray gene expression data useful for identifying expressed genes between two conditions and for generating patterns across multiple conditions (30). PaGE uses an algorithm that generates discrete patterns using a false discovery rate based on confidence measures. In the present study, we used a false discovery rate of 20% to maximize sensitivity without significantly impacting accuracy. To identify candidate genes involved in early growth as well as later phases of liver regeneration, we also determined which genes were differentially expressed between time points. Twoclass, unpaired tests were performed to identify genes that were differentially expressed by Ͼ1.5-fold at 2, 16, and 40 h post-hepatectomy compared with quiescent liver.
Using these criteria, a total of 641 genes were found to be differentially expressed (Table I and Fig. 1B). A sequential increase in the number of genes differentially expressed was observed with time after partial hepatectomy, with the com- parison between 0 and 40h producing the largest change in gene expression. Table II lists the top 10 genes by -fold change, up or down, for each of the comparisons made, whereas the complete list of affected genes is given in Table 1S (supplemental data). The groups of differentially expressed genes are shown schematically in the Venn diagram of Fig. 1B, again highlighting the fact that the largest number of genes were differentially regulated at 40 h post-hepatectomy versus 0 h. Statistical Analysis Using PaGE Is Highly Predictive of True Differences in Gene Expression-QPCR was used to determine how accurately the PaGE statistical analysis identified differentially expressed genes. 28 genes were chosen randomly from the 0 versus 40 h comparison. Of these 28 genes, 23 (82%) were found to be differentially expressed between 0 and 40 h posthepatectomy by QPCR (Fig. 2). Thus, PaGE is a highly accurate tool for the analysis of microarray data. The predictive power of the PaGE statistical analysis program was further highlighted by the correspondence between the preset 20% percent false discovery rate and the fact that ϳ20% of the genes showed no statistically significant change in gene expression by QPCR.
Identification of Global Programs Important for Homeostatic and Proliferative Responses in the Proliferating Liver-Coordinated changes in the expression of networks of genes involved in growth or homeostatic functions are essential for restoration of liver mass and survival during the post-hepatectomy period. The EASE program (see "Materials and Methods") was used to identify important biological themes over-represented among genes that were differentially expressed at 2, 16, and 40 h post-hepatectomy (32). Fig. 3 summarizes some of the most significant themes that were identified using this analysis, whereas a complete list of themes with significant scores is contained within supplemental Table 2S.
Genes involved in steroid and lipid metabolism were downregulated as early as the 2-h post-hepatectomy time point and remained decreased throughout the remainder of the time course examined (Fig. 3A). At the 16-h post-hepatectomy time point, which corresponds to hepatocyte mid-G 1 phase, genes involved in nucleotide and protein synthesis, as well as cytoskeletal organization, were up-regulated and remained upregulated 40 h post-hepatectomy. At 40 h post-surgery, when the peak of S phase is reached, several additional biological themes focused on nucleotide metabolism were over-represented, likely corresponding to the increased need of the hepatocytes during DNA replication. Taken together, the results of the EASE analysis demonstrate a shift to those transcriptional programs that encode proteins involved in DNA synthesis and mitosis, with a corresponding reduction in the expression of genes encoding proteins involved in most metabolic functions, which is summarized in the scheme shown in Fig. 3B.
Identification of Biologically Relevant Networks-Although the EASE analysis provided information regarding changes in large categories of biological processes, we were also interested to understand how individual genes were integrated into specific regulatory and signaling networks. This type of analysis has not been reported in microarray studies of the regenerating liver and revealed several findings that have not been described previously for the partial hepatectomy model. Biologically relevant networks were drawn from the lists of genes that were differentially expressed at the 2-, 16-, and 40-h time points through the use of Ingenuity Pathways Analysis (see "Materials and Methods" for details). For each time point, several major pathways were identified, and here we have shown the most significant network for each stage of liver regeneration analyzed. At 2 h post-hepatectomy, the pathway shown in Fig. 4 was identified as being the most significant, containing 28 focus genes, with a highly significant score of 48. This pathway highlights a very powerful feature of this explorative tool. Our initial PAGE and EASE analysis reported previously revealed that many genes involved in metabolism were differentially expressed 2 h post-hepatectomy (Fig. 3/ Table I). However, many of the known growth response genes that are induced in early G 1 phase were not detected as part of our initial array analysis. All of these growth response genes are contained on the PancChip M5.0 but were not detected because of the sensitivity limit of this and other microarray platforms. As shown in Fig. 4A, the pathway analysis identified Fos, JunB, JunD, and Myc as likely participants in early growth responses based on the differential expression of other genes in the same pathway. Prompted by this result, we then utilized the more sensitive QPCR analysis to show that Fos, JunB, JunD, and Myc were up-regulated 209-, 29-, 22-, and 14-fold respectively. The inclusion of these QPCR data confirmed the results of previous studies demonstrating the induction of these proto-oncogenes in the partial hepatectomy model and allowed us to identify a link between Myc and DUSP6 (MKP3), an inhibitor of extracellular signal-regulated kinase/mitogenactivated protein kinase signaling activity (34). To our knowledge, this is the first report of MKP3 expression and up-regulation post-hepatectomy in the liver.
At 16 h post-hepatectomy, the network shown in Fig. 5  contained 35 focus genes, with a highly significant score of 61. The inclusion of the QPCR expression value for Myc at this time point identified regulatory links to other genes that were differentially expressed at the 16-h post-hepatectomy time point. The Myc targets that were differentially expressed encode proteins involved in diverse processes, including cytokine signaling (ARTS-1) (35,36), matrix remodeling (SPARC) (37), and cell cycle progression (p21) (12).
The 40-h time point corresponds to the peak of hepatocyte S phase in the partial hepatectomy model. Analysis of genes differentially expressed at this point identified several regulatory networks, including pathways involved in DNA replication FIG. 3. Biological themes over-represented at various time points during liver regeneration after partial hepatectomy. A, partial list of biological themes found to be over-represented at a given time point post-hepatectomy. The biological pathways indicated were identified using EASE and are listed with the corresponding p values. A complete list of all statistically significant themes is contained in Table 2S (supplemental data). B, schema summarizing important changes in the gene expression of major biological processes during liver regeneration.

FIG. 4. Ingenuity Pathway Analysis identifies a network of genes regulated during early G 1 phase in the liver in vivo.
The network is displayed graphically as nodes (genes/gene products) and edges (the biological relationships between the nodes). The intensity of the node color indicates the degree of up (red)-or down (green)-regulation. As described in the legend provided, nodes are displayed using various shapes that represent the functional class of the gene product. Edges are displayed with various labels that describe the nature of the relationship between the nodes (A, activation; B, binding; E, expression; I, inhibition; P, phosphorylation; T, transcription). Edges without a label represent binding only. The four nodes MYC, JUNB, JUND, and FOS were identified by the pathway analysis as part of the network, and their differential gene expression was determined subsequent to the pathway analysis by QPCR. A total of 28 differentially expressed focus genes were brought into this network with a highly significant score of 48. Uncommon gene symbols shown are DDX5, dead-box polypeptide; CANX, calnexin; H3F3B, histone H3B; PCK1, phosphoenolpyruvate carboxykinase; and CEBPB, CCAAT/enhancer binding protein ␤.

FIG. 5. Ingenuity Pathway Analysis identifies a network of genes regulated during mid-G 1 phase in the liver in vivo.
The network is displayed graphically as nodes (genes/gene products) and edges (the biological relationships between the nodes). For the explanation of the symbols and letters, see the legend to Fig. 4. A total of 35 differentially expressed focus genes were brought into this network with a highly significant score of 61. Uncommon gene symbols shown are PCK1, phosphoenolpyruvate carboxykinase; CDKN1A, p21; EP300, E1A binding protein p300; HSPCA, heat shock protein 1 ␣.  (Fig. 6). Our findings underscore the requirement for coordinate regulation of cytoskeletal and chromosomal components of mitotic complexes prior to the G 2 /M phase transition (52,53).
We demonstrated here that mRNA levels for a large number of genes important for G 2 /M phase progression at the S phase peak are regulated coordinately, strongly supporting the notion that either transcriptional activation and/or posttranscriptional stability play a key role in phases of cell cycle progression beyond the G 1 phase. Although our analysis measures steady state levels of mRNA and therefore does not distinguish between these two mechanisms, several genes identified on our array, including Mad2L1, Bub1b, and Cdc2, are E2F transcriptional targets, suggesting that, similar to other growth models, E2Fs may be responsible for the upregulation of these mitotic checkpoint genes in the regenerating liver (26,54). Our ability to detect small changes in gene expression at the 40-h post-hepatectomy time point is likely to be a function of the highly synchronous nature of this in vivo model of cell cycle progression, a characteristic unique to the hepatectomy model. CONCLUSION In this study, we examined the changes in gene expression that occur following partial hepatectomy at selected time points with the aim of understanding the biological processes and specific regulatory networks that are important during early priming stages of the regenerative process and later time points corresponding to mid-G 1 and S phase. Through the use of a variety of computational analysis tools, we have identified genes that are differentially expressed during various stages of liver regeneration and have characterized biological processes and regulatory networks that provide both global and specific information regarding this highly synchronized proliferative and homeostatic response. Although the links between individual genes identified by Ingenuity pathway analysis are derived from the scientific literature, to our knowledge, this is the first description of the relevance of many of these regulatory networks in the partial hepatectomy model. In addition, we have demonstrated that the use of pathway analysis helps to overcome two problems of microarray expression profiling, which are both its limited sensitivity and the absence of certain genes on a given array platform. Finally, although transcriptional regulation is a key process by which protein function is regulated, pathway analysis can also include genes in a regulatory FIG. 6. Ingenuity Pathway Analysis identifies networks of genes regulated during S phase in the liver in vivo. The network is displayed graphically as nodes (genes/gene products) and edges (the biological relationships between the nodes). For the explanation of the symbols and letters, see the legend to Fig. 4. This network was produced by combining the two highest scoring networks with a total of 31 differentially expressed focus genes and a highly significant score of 58. Uncommon gene symbols shown are AURKB, aurora kinase B; BIRC5, survivin; CCNE1, cyclin E1; CCNE2, cyclin E2; CDKN1A, p21; CDKN1B, p27Kip1; CDKN1C, P57KIP2; CEBPB, CCAAT/enhancer binding protein ␤; DCTN1, dynactin 1; DCTN2, dynactin 2; DDIT3, GADD153; TFDP1, DP1; and YWHAQ, 14-3-3 homolog. network that are modified post-translationally or as a result of protein-protein interactions that may also be important regulatory mechanisms. The approaches taken here for the analysis of the liver regeneration paradigm can serve as a model for the dissection of many other complex biological processes.