Analyses of p53 Target Genes in the Human Genome by Bioinformatic and Microarray Approaches* □ S

,

Mutational inactivation of the p53 gene product is one of the most common genetic changes seen in human cancers. Overexpression of wild-type p53 protein in p53-deficient cells can arrest cell proliferation, reverse a tumorigenic phenotype, and sometimes induce apoptosis or differentiation (1,2). p53-dependent apoptotic pathways are critical to the development of tumor (3). The biochemical activity of p53 most closely associated with tumor suppression is its function as a transcription factor. A number of investigators have identified a consensus DNA binding sequence for wild-type p53. El-Deiry et al. (4) used p53 to select DNA fragments from genomic DNA and found that selected fragments have two copies of the decamer motif 5Ј-RRRCWWGYYY-3Ј separated by 0 to 13 base pairs of random sequence. In the motif, R ϭ G or A, W ϭ T or A, Y ϭ C or T. Several human genes whose expression is positively regulated by wild-type p53 have been identified. These include mdm2 (5), bax (6), gadd45 (7), proliferating cell nuclear antigen (8), p21/WAF1 (9), insulin-like growth factor-binding protein 3 (IGFBP3) 1 (10), the muscle creatine kinase gene, thrombospondin-1 (11), vascular smooth muscle ␣-actin gene (ACTA) (12), epidermal growth factor receptor (13), and type IV collagenase (MMP-2) (14). Although the number of identified p53 target genes keeps growing, it is conceivable that large fraction of the p53 target genes have not yet been identified. The ultimate challenge to p53 biology is to define the complete gene regulatory network. In facing this challenge, the complete human genome "draft" sequence is an invaluable resource (15, 16). Knowledge of genes containing p53 consensus binding sequence in the regulatory region will greatly assist the identification of p53 target genes, which could be potential targets for cancer chemotherapeutic drugs. Bioinformatics methods become immediately more useful if they can be integrated with high-throughput gene expression analysis on the same scale. In this report we described analysis of global p53 gene regulatory network through computational methods and genome-wide gene expression analysis.

MATERIALS AND METHODS
Computational Analysis-All noncommercial software used in these studies was written in PERL 5.0. The results of subsequent analyses were organized in a relational data base (Sybase, SQL Server Release 11.0, Emeryville, CA, Sybase Inc.).
Cell Culture-The human lung cancer cell line A549, the human ovarian cancer cell lines PA-1 and 2774, and the human osteogenic sarcoma cell line Saos2, were obtained from American Type Cell Collection (ATCC). The cell line, 2774qw1, was a single clone derived from human ovarian cancer cell line 2774. Culture conditions of all cell lines followed the instruction from ATCC.
Treatment of Cells with Recombinant Adenovirus Expressing p53 and RNA Isolation-To introduce exogenous p53, 2774qw1 cells were infected with recombinant adenovirus expressing p53 or adenovirus containing empty vector for 1 h. The recombinant adenovirus was then removed and cells were washed 1-2 times with phosphate-buffered saline (Life Technologies, Inc.). Cells were then refed with fresh culture medium and cultured at 37°C and 5% CO 2 for 2-24 h as indicated. Cells were harvested and total RNA was isolated using a CsCl purification method as described (17). Poly(A) selection was carried out using Qiagen Oligotex kit for microarray experiments.
Quantitative RT-PCR-The quantitative RT-PCR of 13 genes was performed as described previously (18). The primers and probe specific to 13 genes were designed (see Supplementary Material, Table S1). Human ␤-glucronidase gene, whose expression was not influenced by p53 across the test samples, was used as an internal positive control to ensure that equivalent amounts of RNA were included in each assay. * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Chromatin Immunoprecipitation (ChIP) Assays-Approximately 1 ϫ 10 7 A549, PA-1 and Saos2 cells were treated with 500 nM adriamycin or Me 2 SO (only for PA-1) for 24 h, respectively. ChIP assays were performed using a modification of a protocol provided by Bruno Amati's laboratory in DNAX (19). Precleared chromatin was immunoprecipitated with a monoclonal antibody against p53 (Ab-6, Oncogene Research Products), or no antibody added as a negative control. The enrichment of specific DNA fragments was analyzed by real-time quantitative PCR on Taqman (PE 7700) with primers and probes flanking the putative p53-binding sites of each genes. Sequences of primers and probes are available in Supplementary Material, Tables S2 and S3. cDNA Microarray Hybridization-The RNA purified from 2774qw1 cells infected with rAd-p53 or rAd-control was fluorescent labeled and hybridized on 6 Incyte GeneAlbum GEM microarrays by Incyte Genomics, Inc.
Enrichment Factor Calculation-The Enrichment Factor (EF) is a parameter that estimates the extent to which genes identified in the p53 Target Data base are overrepresented in pools of genes that show changes in expression in cDNA microarray experiments. We classify genes into three categories: those directly regulated by p53, those not regulated by p53, and those indirectly regulated by p53. The numbers of genes in these categories are indicated by G NR , G D , and G IND , respectively. The ratio between p53 target genes and nontarget genes in the human genome is therefore, The fraction of p53 target genes included in the p53 Target Data base is defined as p, and the fraction of indirectly regulated and non-p53 regulated genes included in p53 Target Data base is q, such that (p, q Ͻ 1). We assume the same value of q for nonregulated and indirectly regulated genes. For the p53 Target Data base then, EF is then defined as, with a value for EF Ͼ 1 indicating bona fide enrichment for p53 target genes in the p53 Target Data base. We can use cDNA microarray to represent a large random subset of the complete genome and use hybridization experiments to determine the p53 responsive genes within the subset. The ratio of p53-responsive genes to p53 nonresponsive genes for such a set of genes is, where R random is the number of p53-responsive genes observed in microarray experiments and T random is the total number of genes on the microarrays. Similarly, the ratio of p53-responsive genes to p53 nonresponsive genes for genes in the p53 Target Database that are included on the cDNA microarray, where R target is the number of p53-responsive genes observed in the microarray experiments that are in the p53 Target Data base and T target is the total number of genes on the microarrays that are in the p53 Target Database.
Therefore, the ratio of Equations 4 and 5 is given by, Since EF ϭ p/q, EF is always greater than the ratio of R 1 /R 0 . Hence, the lower boundary for EF can be calculated by the equation.

RESULTS AND DISCUSSION
Genome-wide Identification of Human Genes Containing a p53 Consensus Sequence(s) in Their Regulatory Regions-To identify potential p53 target genes in human genomic sequences, the "FindPatterns" program (Wisconsin Sequence Analysis Package Version 10) was used to search the primate division of GenBank TM Release 120 (October, 2000) for the p53 consensus sequence, 5Ј-RRRCWWGYYY (n ϭ 0 -13) RRRCW-WGYYY-3Ј (4). In this motif, R ϭ G or A, W ϭ T or A, Y ϭ C or T, n ϭ any base. Some p53 target genes (human mdm2, IG-FBP3, and gadd45) have functional p53 response elements in their intron regions (5,7,20). The data base search therefore includes intronic sequences. We took advantage of annotated features present in GenBank TM entries to restrict matches to CAACATGTTC a These are rapidly expanding areas of the investigation and are likely to be incomplete. b The regulatory region is defined as 5Ј-flanking region (2 kilobase cutoff) and the first 3 introns. c Lowercase letters refer to the insert sequence between two decamers, whose sequence is shown in uppercase letters. the regulatory region of the genes, which consisted of 6,541 annotated human 5Ј-flanking sequences and 24,659 annotated human intron sequences. On the basis of the statistics of known p53 target genes (20), no more than 3 mismatches to the canonical consensus sequence were permitted in the selection. We collected 13 nucleotide sequences known to be functional p53-responsive elements in 10 genes; 23 out of 26 decamers bear a cytosine at position 4, and 25 out of 26 decamers have a guanine at position 7 (Table I). Hence, the sequence matches were further sorted with no mismatch allowed for C at position 4 or G at position 7. Using this approach, we identified 1,121 genes with at least one p53 binding sequence in the region 2,000 base pair upstream of the transcription start site, and 558 genes with at least one p53 binding site within the first 3 introns.
The efficiency of the computational analysis was evaluated with known p53 target genes ( Table I). The p53 response elements of bax, MMP-2, mdm2, IGFBP3, TGF-␣, and gadd45 were recognized by this analysis. Additional p53 consensus sequences were found in the promoter or intron regions of these genes. Although they have a nucleotide in the positions 4 or 7 other than C or G in their p53 response elements, proliferating cell nuclear antigen, ACTA, and epidermal growth factor receptor genes were also selected because additional binding sequence matches were found in their regulatory sequences. p21/WAF1 was not selected because its p53 response element is located more than 2 kilobases from its transcriptional start site.
This annotation-based approach is limited by the availability of annotated human gene sequences in GenBank TM . To circumvent this shortcoming, a transcript mapping approach was used to locate the promoter sequence on a genomic template. A collection of human mRNA was first extracted from the primate division of GenBank TM flat file (October, Release 120) primate division. The draft human genome sequence was used to identify or to extend the 5Ј-flanking region of many existing mRNA sequences for which the annotated gene structures are not available or are very short in their respective individual GenBank TM records. To ensure that the 5Ј end of an available mRNA is close to the transcription start site, only mRNAs that encode the NH 2 terminus of the protein were used for transcript mapping. Both finished and draft human genomic sequences were used as genomic templates. The analysis was done when ϳ94% of the human genome was deposited (63% as working draft sequence) into GenBank TM . Within the 22,034 promoters mapped in this way, 4,428 genes were identified that have at least one potential p53-binding site. For a sample of 150 promoters annotated in GenBank TM , 132 (88%) were perfectly predicted by the transcript mapping, suggesting that the transcript mapping procedure could properly predict most promoters. Nevertheless, a small portion of predicted promoters (12%) still differ from GenBank TM annotated promoters. This resulted from either a duplicated copy, a different transcript start site of the annotated gene due to gene duplication, or a 5Ј end splice variant. To remove redundancy, we collapsed all candidate genes identified by these two approaches, resulting in 4,852 candidate genes in the p53 Target Database.
Small Scale Verification of the in Silico Prediction by Taqman Analysis and Chromatin Immunoprecipitation Technique-To assess the validity of the in silico prediction, 13 genes were randomly selected from the p53 Target Database and tested for their p53 responsiveness using real-time quantitative RT-PCR. Total RNA samples were purified from human ovarian cancer cell line, 2774qw1, at various times postinfection with rAd-p53 or rAd-control. As shown in Fig. 1, the expression of myogolbin, cardiotrophin-1, catechol o-methyl-transferase, IGFBP4, and ␣1-acid glycoprotein 2 genes was induced very early (2 h post-infection of rAd-p53). In contrast, the activation of carboxyl ester lipase and L-histidine decarboxylase genes were observed relatively late (8 h post-infection). ␣-Fetoprotein and interleukin 8 receptor ␣ genes were repressed by p53 (Fig. 1A). It is interesting to note that although it has not previously been demonstrated for human ␣-fetoprotein, expression of the mouse homologue of the ␣-fetoprotein gene is repressed by p53 through sequence specific DNA binding in the promoter (21). ATP synthase-␤ and mitochondrial ATP synthase C genes slightly changed at the mRNA level following the expression of p53 (data not shown). It is also interesting to note that ␣1-acid glycoprotein 2 and complement component C1-inhibitor genes, which only have p53 DNA consensus sequences in their introns, were also activated by p53 (Fig. 1), confirming that sequences of introns should be included in searching for p53 DNA binding consensus sequences.
The 11 genes shown to respond to p53 at the mRNA level in Fig. 1 contain p53-binding sites in their regulatory regions, suggesting that they might be directly bound by p53. To test this, we took advantage of the chromatin immunoprecipitation technique that allows us to study p53 protein-DNA interaction in vivo. Five genes that respond to p53, L-histidine decarboxylase, ␣1-acid glycoprotein 2, cathechol-o-methyltransferase, myoglobin, and cardiotrophin-1, were evaluated in this way. One or two putative p53-binding sites were selected to design site-specific primers for each gene. A human PA-1 ovarian carcinoma cell line was used because it expresses endogenous functional wild-type p53 (22), which is induced by the treatment with the DNA damaging agent adriamycin. Using antibodies against p53, chromatin was immunoprecipitated from PA-1 cells treated with 500 nM adriamycin or Me 2 SO for 24 h. The ChIP assays showed that genomic fragments containing p53-binding sites for all sites tested were enriched after immunoprecipitation using the p53-specific antibody, and the enrichment after treatment of adriamycin are greater than control samples ( Fig. 2A). Negligible quantities of chromatin were recovered when no antibody was used. Similar results were observed with the lung carcinoma cell line, A549, which expresses wild type p53 (Fig. 2B). Furthermore, only background levels of the promoter fragments were precipitated from a human osteogenic sarcoma cell line, Saos2 (null for p53), in the presence of p53 antibody (Fig. 2B). Two known p53 target genes, p21/ WAF1 and proliferating cell nuclear antigen, were checked for the interactions of p53 with their promoters and were both positive (data not shown). The results demonstrated that endogenous p53 protein interacts directly with predicted p53 binding sequences in these 5 genes, and that in silico prediction can greatly assist us in focusing on regions likely to confer p53-mediated expression. The biological properties of these genes are consistent with the functions that have been attributed to p53, such as, cell cycle checkpoints, apoptosis, inhibition of angiogenesis, and genetic stability (2,(23)(24)(25). The data suggest that these genes could be mediators of p53 expression through transcriptional regulation.
Integration of the p53 Target Database and Genome-Wide Expression Analysis-High-density DNA microarrays appear to be the sole approach to study gene expression at the scale comparable to that of the p53 Target Data base. In an attempt to identify p53 target genes on a genomic scale, Incyte Gene-Album microarrays (ϳ60,000 cDNAs) were hybridized with the same batch of mRNA used in Taqman analysis described above. 1,501 genes (4.4%) were found to be responsive to p53 using a 2.5-fold change up or down as a cutoff. Approximately 80% of these were repressed by p53. 2 Here we focus on the overlap of p53 target genes predicted in silico with those identified by hybridization to microarrays. However, there was no single established method to estimate the significance of an observed degree of in silico prediction. Accordingly, we developed the EF to test the validity of our in silico predictions. T 0 is the ratio of genes directly regulated by p53 in human genome to those that are not regulated by p53. T 1 is the similar ratio for genes in our p53 Target Database. The EF is defined as the comparison of these ratios (EF ϭ T 1 /T 0 ), such that an EF Ͼ 1 indicates enrichment for genes regulated by p53 in the p53 Target Database. Since the actual number of genes in the human genome are not known, we used the genes on the microarrays as a large representative set for this analysis. As described under "Materials and Methods," the low boundary for the EF can be estimated from the following equation, where R target is the number of p53-responsive genes observed in the microarray experiments that are in the p53 Target Data base, T target is the total number of genes on the microarrays that are in the p53 Target Data base, R random is the number of p53-responsive genes observed in microarray experiments, and T random is the total number of genes on the microarrays. The six Incyte GeneAlbum microarrays used in hybridization represent 33,615 individual genes (T random ) based on "assembling" of the sequences into gene-based groups. Of 4,852 unique genes in the p53 Target Database, 3,387 of them (T target ) are represented on the GeneAlbum microarrays. In the microarray experiments using 2774qw1 ovarian cancer cells infected with rAd-p53, 217 genes (R random ) showed activation following p53 expression. Of those, 68 genes (R target ) were found to have at least one p53 DNA binding sequence in their regulatory region, and the resulting EF is greater than 3.2. It suggests that the p53 Target Database enriched p53 direct target genes at least 3.2-fold compared with a randomly selected gene pool. We observed that 1,205 genes (R random ) are repressed following p53 expression, suggesting that transcriptional repression is also an important mechanism through which p53 exerts its function. Of those, 296 genes (R target ) that are represented in the p53 Target Database, and the EF for the repressed genes was estimated to be 2.6.
It is well known that p53 mediates transcriptional activation through sequence specific DNA binding (5,7,9,10,12,14). In contrast to transcriptional activation, the molecular basis of transcriptional repression by p53 is poorly understood. This is because most genes that have been reported were repressed by p53 have no classical p53-binding sites in their promoter. In general, genes repressed by p53 can be classified into two types. For the first type of genes, the mechanism is ascribed to sequestration of components of the basal transcription machinery by p53 through protein-protein interactions in the absence of direct DNA binding (26 -28). However, p53 mediated repression of target gene expression by direct DNA binding has been demonstrated in several instances (21,29,30). In these cases, p53 binds to the DNA-binding site, which overlaps the binding site of another transactivator protein. The repression by p53 results from displacement of the activator binding. Our data gave higher EFs than background for both activated and repressed genes, indicating that p53-activated genes as well as p53-repressed genes, at least some of them, are directly regulated through p53 specific DNA binding. The EF for p53-repressed genes is lower than that for p53-activated genes, supporting the notion that repression by p53 has multiple mechanisms with or without requirement of sequence specific DNA binding by p53. This could result in "dilution" of overall EF values for repressed genes. Obviously, the complexity of regulation of p53 repressed genes is significant and requires further study.
To verify these observations, another data set of microarray experiment, derived from human leiomyosarcoma cell line, 2 S. Liu, manuscript in preparation.
FIG. 3. EF temporal profiles of genes activated and repressed by p53. Human ovarian cancer cell line, 2774qw1, was infected with 10 10 particle/ml rAd-p53 or rAd-control for 1 h. Total RNA was extracted at the indicated times post-infection and applied to hybridization on Incyte GeneAlbum microarrays as described under "Materials and Methods." The p53 responsive genes are classified based on the initial time points at which at least 2.5-fold differential expression is detected on microarrays. The activated and repressed genes by p53 are plotted separately.
SKUT-1, infected with rAd-p53 or rAd-control, 3 was subjected to the EF analysis. Similar trends were observed (data not shown), suggesting that the EF is a valid measure for the selectivity of in silico prediction and is sensitive enough to reflect the specificity of p53-binding sites.
In principle, genes directly regulated by p53 should respond to p53 expression more quickly than secondary responsive genes. However, it is difficult to determine the best sampling time to enrich for p53 direct target genes, since an early response might only become detectable at later time points with the accumulation of transcripts. With the help of the p53 Target Database, we estimated the relative EFs for genes differentially expressed at different time points following p53 expression in 2774QW1 (Fig. 3). For the p53-activated genes, we observed that EF remains above 2.5 over the time course. This suggests that p53 direct target genes are still enriched in relatively late time points (16 -20 h post-infection) in addition to the early time points (4 h post-infection). This may result from a consistently high level of p53, which provides continuous activation of p53 target genes. The expression of exogenous p53 in 2774qw1 cells starts as early as 2 h post-infection, reaches peak at 4 h post-infection, and stays high until 24 h post-infection. 4 This is also supported by the observation that the activation of L-histidine decarboxylase gene expression, whose predicted p53 binding sequence is bound by p53 in vivo, was observed after 8 h post-infection ( Figs. 1 and 2).
Unlike the activated genes, the EF for repressed genes starts off with a very low level, close to the background (EF ϭ 1). Then it gradually increases and reaches a peak (EF ϭ 3) around 12 h post-infection and then gradually decreases to the background level. Based on the EF analysis, p53-repressed genes respond to p53 slower than activated genes. As discussed above, lower EF value may be due to the mixed regulatory mechanisms of repression. Most p53-activated genes are regulated by direct p53 binding whereas repressed genes require competitive DNA binding of p53 with another activator protein if the repression is mediated by specific DNA binding, which may explain why repression was delayed as compared with activation. In addition, the activated and repressed genes display quite different temporal patterns (Fig. 3). Interestingly, we observed that the apoptotic response started at 12 h post-infection of rAd-p53, but not rAd-control, 4 when the repressed genes reached the highest EF, indicating a potential connection between p53mediated gene repression and apoptosis. This is consistent with other reports that p53-induced apoptosis through the repression of transcription but not through the transactivation (32,33). 4 Whereas, p53-mediated transactivation is known to be related to genes important in both cell cycle progression and apoptosis (6,7,9,(12)(13)(14).
The p53 Target Database could not only be used to delineate p53 target genes, but also could assist to identify other signaling pathways that are potentially connected to the p53 pathway. Assuming p53 mediates certain growth factor-induced signals, the differential expression data derived from growth factor stimulation experiments would give increased EF values since p53 target genes would be part of the effected genes. Perou et al. (34) used human normal mammary epithelial cells subjected to a set of experimental perturbations, including addition of TGF-␤1 for 24 h, withdrawal of EGF for 2 days, addition of interferon alpha (IFN-␣) for 24 h, and addition of IFN-␥ for 24 h. To evaluate whether p53 mediates the signaling induced by these growth factors, the microarray data (using 2.5-fold change as a cutoff) was linked with the p53 Target Database, and subjected to EF analyses. Interestingly, the TGF-␤1 treatment (24 h) data showed a relative high EF (2.3) whereas the data from other treatments gave EFs close to 1. When the list of genes effected by TGF-␤1 was closely examined, it was interesting to note that 16 out of 22 genes have p53 binding sequences in their regulatory region (two of them have no genomic sequence available at this moment, Table II). The fact that the majority of TGF-␤1-induced genes contain p53binding sites suggests that p53 might be a mediator of the growth suppressive effects of TGF-␤1. This is supported by the evidence that p53 expression was induced during TGF-␤1 initiated growth inhibition and apoptosis (35). The methods developed here are not limited to identifying the p53 gene regulatory network, but can also be applied to any other transcription factors of interest. We have extended the computational analysis of human genomic sequences to identify DNA-binding sites for forkhead transcription factor, which can be phosphorylated by Akt/PBC, a serine-threonine kinase that plays a critical role in cell survival signaling. TGF-␤2 was identified in this way and confirmed experimentally as a transcriptional target of forkhead transcription factor (36). The observation that TGF-␤2 gene expression is repressed in tumors expressing activated Akt suggests a new mechanism to abrogate the growth inhibition or apoptotic effects (or both) of TGF-␤2. In summary, we believe that the integration of human genomic sequence information and high-throughput gene expression analysis for all transcription factors in future will greatly assist us to define global gene regulatory networks.